Captcha and ReCaptcha - Why they aren’t as annoyingly pointless as you may think
Captcha forms. Ignoring the atrocious spelling, which seems to be endemic right now, (why spell a word correctly when you can leave out some vowels, or maybe just chuck a ‘z’ in? Grrr), the little web forms seem to appear on every website where you’re needed to verify that you are, in fact, human, and not some evil spam-delivering robot intent on online domination.
CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart. These Captcha forms (or Recaptcha as I’ve also seen them referred to) are normally grainy pictures of numbers or words, presented in pairs, and you have to enter the two sets of characters correctly in order to proceed. They are normally used on websites where you are not a signed in, active user.
Have you ever noticed that when you get Captcha forms, there’s normally one that is really easy to read, whilst the other looks as though it’s been written by a toddler using a pencil for the first time? I’ve lost count of the amount of times that I have tapped the details in, completely convinced that I’ve got at least half of the second picture incorrect due to the blurred, overlapping nonsense that is presented to me. The thing is, I hardly ever get them wrong.
I’d like to claim that this is due to a superior intellect, and a keen eye for detail. However, I write that safe in the knowledge that just last week I spent my own name wrong (it had been a long week, please don’t judge me), so it’s definitely not that.
Being slightly geeky I thought I’d try and find out more about the forms. It turns out that ReCaptcha is not only a way of stopping access to restricted areas of websites, it’s also used as a way to digitise books. As Wikipedia explains: “The reCAPTCHA service supplies subscribing websites with images of words that optical character recognition (OCR) software has been unable to read. The subscribing websites (whose purposes are generally unrelated to the book digitization project) present these images for humans to decipher as CAPTCHA words, as part of their normal validation procedures. They then return the results to the reCAPTCHA service, which sends the results to the digitization projects.”
In other words, it’s easy for the human eye to distinguish words and numbers. It’s a basic skill we are taught even before we start preschool. As we grow, the brain is intelligent enough to distinguish various fonts and handwriting, and even on occasions when a letter is badly written or unclear, the mind can unravel the word and work out what it’s supposed to say.
Web companies, for all their might, power and wealth, are unable to duplicate the human brain (and let’s just pause for a moment to thank whoever it is you need to thank that the likes of Google aren’t able to do this). Therefore, it uses the human brain to help. When you are viewing Captcha forms, you’ll always see one image which is clear and easy to decipher. As long as you get this correct, your Captcha code should work. The second, less clear image needs working out. The Captcha form present it to you, you enter what you think it says, Captcha stores that information. Bear in mind that this happens millions of times, Captcha can collectively use that information. If 498 people out of 500 have all said that the second picture says a certain word or number, you can be pretty certain the majority are correct, so Captcha automatically marks the previously uncertain word or number as now being known.
As Captcha gets its images from scanned books and the like, they are essentially using their millions of users to ‘translate’ for them. It’s a genius way if doing things, whilst offering a worthwhile service in the form if online security via Captcha forms in the process.