This past week has been crazy. I was in a car accident right as school just started and on top of all of that I have started a gig with a freelance php development company. I will get you guys some better material, I have a few subjects I am working on that should be very promising later this month.
For now, lets talk about human verification.
I am an opponent of CAPTCHA based verification mechanisms. By the time you make them un-parse-able by OCR engines they have become illegible, and makes legitimate use of your applications much more difficult. I am a proponent of easy logic questioning. Just make the layout polymorphic and un-parse-able.
The first thing you need is question templates. If we were doing mathematical logic, some examples of templates could be:
‘What is the sum of ? and ?’
‘If you add ? together with ? , what is the result’
‘? is subtracted from ? , the answer is’
Then you would have a multidimensional array with the key identifying the variable used in the logic question, in this case numbers. The dimensions will have different levels of obfuscation, such as medium and hard, with the original array index being the starting number.In this case, I will only have 1 alternate level of obfucation, with the worded number being the alt, for example:
$logicVariables[1] = ‘one’;
// This is just an example of the MD array, I wouldn’t use this sort of obfuscation as it is confusing
$logicVariables[1][medium] = ‘one’;
$logicVariables[1][hard] = ‘won’;
Now you will need a scrambler function that can scramble your template and variable a bit, to make it harder to parse. For instance running one of our sample templates could come out like this:
‘What is the sum of ? and ?’
// scrambled ex 1 , hard scramble
‘Wh@t. is the sum; of “?” and /?/’
// scrambled ex 2, soft scramble
‘What is the sum of ‘? and ?’
This function can make passes on the template and randomly change the character it lands on to an alternate obfuscated version like above, or if a space, could add another space or arbitrary characters to try and trip up parsing engines. It can change delimiters around the variable placeholder, change capitalization for poorly coded engines to fail, etc.
Make this all into an extensible class. Make it start by grabbing a random template out of your template database (hardcoded, sql, flat config file etc) and choose random variables (same way), then calculating the answer to save in the session (not cookie). Then run the variables and template through your scrambling function which uses a random variable that determines what setting it should be scrambled at, and how many passes to make. At the end, assemble the question and display it to the user.
You should add templates and variables(if needed) to your database often, and should often run QA tests, as well as keep a reporting function to log incorrect attempts, complete with what the question looked like, what it was before scrambling, the correct answer, and the answer given by the user (maybe even allow comments).
Guys (and ladies) this isn’t a hard approach and it would solve all this captcha bs. If they cant break your CAPTCHA with OCR, they will outsource it to Nigerians(jk) (…but yeah, they seriously will) and you cannot stop that (maybe a time limit or something, but we fall back to usability, if I knew more about outsourcing this stuff, I could design you something).
I will add an example class later that can give you an idea on how this all comes together if you are having a problem understanding or visualizing it.