Google has acquired reCAPTCHA, a free CAPTCHA service that also serves as a means of digitizing printed books and newspapers. Among other things, the Mountain View web giant is looking to juice its ever-controversial library-scanning Book Search project. Google announced the acquisition this morning with a post to the Official …
Recaptcha isnt the best way to protect your site
Against spam, the best way I've seen so far is thusly.
A HTML form, has many inputs, a computer sees them and just inputs something in order to get the form to submit, but since it doesnt know what information is needed or not, it fills in all the fields.
So basically, you have to separate the computer and human apart, based on what they fill in. Recaptchas and the like are slow, ignored, awful, breakable and basically turn people away from sending in vital information your site could need.
What if you build a HTML form, with a section of it, which was position absolute and visibility hidden style attributes on the page, they are <input type=text> fields and basically, the idea is, you dont fill them in.
Because the stylesheet is being used to obscure them, without a stylesheet, they would be visible, or maybe, with no comprehension of a stylesheet, they would be visible also.
SO basically, along comes the human, with his web browser, fills in the fields he can see, sends away, all the ignore fields are empty, obviously a human, a computer comes along, sees the hidden fields, has no ability to read the css or know they are supposed to be hidden and then fills in the ignore fields, therefore tripping your validation into knowing they are a computer.
It's simple, but I think it's FAR more effective than captchas.
Anyone find a flaw in that plan? sounds perfect so far.
Re:Chris Thomas Alpha
If the human's browser can see the CSS, which it must to be able to present the page correctly, then so can any computer on the Internet, including the bot. Then?
Besides, one would have to identify, in the HTML form, the fields that are supposed to be hidden. Unless it's done differently (or at least randomly) each and every time, the bad guys will learn which key words to avoid. Does not sound very comfortable to work with.
CaPTcHa iS A pITa!
... it's the equivalent of "L33t sp34k" (aka "Moron Text").
I have good visual skils, I am very good at spatial recognition tests, I am reasonably intelligent and can comprehend written language etc.
So why the fuck do I *STILL* have to make repeated attempts to get a web page to agree that I've keyed in one of these bloody things properly?
I fully expect a lot of people to say "sod it, I'll go somewhere else" rather than keep wasting their time being told that they've not typed the munged text correctly.
A Turing test I once saw at tech college
simply asked "What is this a picture of", and showed a small image of a common everyday item; eg a cat, dog, fork, tree, orange, car etc. It seemed to be fairly intelligent, too - for example, it would accept "car", "vehicle", "sedan" for the car pic, and "orange", "citrus", "fruit" for the orange pic. It also allowed for articles and prepositions as well, e.g. "an orange", "this is a car" etc. With a database of several thousand pictures, my thought was the chance of a bot getting a lucky guess on that would be pretty slim.
It was a student's programming project; what became of it I never found out, and I've never seen it used in the field. But it would be far superior to the current mangled-text captchas I've seen everywhere!
@Chris Alpha Thomas
The problem with what you described is accessibility. Browsers that display content in a non-traditional manner (i.e. screen readers) cannot be relied upon to work out that an element would not be displayed in a traditional browser and that they therefore should not read it to the user. In the case of a form field obscured or hidden with CSS, the user would then likely fill it in and be rejected.
There are several good ways of reducing spam from web forms but all of them are a direct trade off against reduced accessibility and particularly on corporate and governmental sites, inaccessibility is a big No-No.
Captcha on CodeProject
I found the best Captcha on CodeProject.
Rather than rendering a very ugly, difficult to read image builds the image using text and CSS (building letters from '1's for example) so it's perfectly readable for a human, but bloody awful for anything else. You can't copy paste the image (as it's not an image) etc
Unfortunately I can't remember the name of it...
ASIRRA - Animal Species Image Recognition for Restricting Access
Have a look at http://research.microsoft.com/en-us/um/redmond/projects/asirra/ , where you have to identify which within a set of ten pictures are cats or dogs. It's been out for some time, but I haven't come across it in Real Life...
The new digital impressment
So when serving CAPTCHAs to websites, reCAPTCHA may insert 1-2 extra unknown words from Google Books. The users decypher a set of known and unknown words to access their websites, while Google Books gets to complete their plans for world domination.
Hang on, so only half the CAPTCHA being sent to me is actually to validate me. The other half is to further their own business venture, which I haven't agreed to and for which I get no compensation (regardless how noble their venture might be). In that case, when I get a two-part CAPTCHA where the first half looks like a bad OCR scan, maybe I will enter the correct answer to the second half and a deliberately wrong answer to the first half, just to see whether it accepts it. Hope that doesn't mess up Google's books...
Missing The Point
People with the attitude that "All forms of protection are as viably breakable by humans as computers" miss the crucial point that it applies as well to their beloved CAPTCHAs as to any other method with a fairly good guarantee of working that is less discriminatory. There is no such thing as "Security by obscurity" where a visible protective barrier is concerned.
Repeat after me: those who stand to gain by breaking or circumventing any protective device will. CAPTCHA is a classic example of starting with the wrong assumption, the abuse conducted is automated, and arriving at the wrong solution, that of distinguishing humans. But, alas, it's sufficiently widespread and the craze sufficiently well-stuck that people go on using this substandard technique as an overkill for much simpler, workable methods for ensuring the same thing.
Re comments about CSS obscuring of honeypot fields: good plan. I'm blind, and can assure you that hidden elements are treated by most screen readers as such (biggest danger to textmode browser users), but there's also the rather simpler possibility of just asking people not to fill in certain text fields or interact with controls. Even so, trust the user not, and do your validation on the server side, for instance condition the comment posting form on needing to read the post page, restrict submissions based on rate after reading, take the commenter's email address, and so on.
PS: this is no get-out clause to website owners, but http://www.solona.net/ . Great service. There's also WebVisum, a Firefox plugin that includes CAPTCHA solving. If you can volunteer your time, please do to these, so blind and other challenged users aren't denied access to CAPTCHA-protected resources, or encourage said resource owners to think again and perhaps implement some alternative or added solution for such users.
@ Dale, that's funny ...
and I 'm glad to see I'm not the only one who thought of that.
Won't impact Google at all. The reCAPTCHA system doesn't accept the word of one person. It waits until three* or more people put in the same word for a given CAPTCHA, then asserts that that result is the correct one.
*I think it's three. Feel free to correct me.
There is no stopping the Google Books behemoth
Sorry Internet Archive, the work reCaptcha were doing for you ends. All of the proceeds now go towards Google Books.
I grabbed the page, cut out the Captcha picture, which says "Google acquires reCaptcha", saved it in OneNote, and asked OneNote to read the text from it.
"Goog2eAçqufrsreCAfTCRA" was what it saw. Took me about a minute. I would imagine most hacky types could do better easily.
Not very effective, is it?
Very true, though I personally can't stand WebVisum as a blind user. It just screws up pages all the time I find.
Best audio captia I ever found was one which gave me a couple bars from a song (changing pitch would be hard to voice recognise). This both helped avoid my having to mentally store the information as well as reducing the mental overheads (it is something which a human brain is used to interpreting, as opposed to random letter/number combinations), and gave it a sufficient sample size that it could account for a certain degree of error. "Though it may seem silly, home sweet home..." was the code I got. Very easy to understand, and it did specifically state that you don't have to be completely accurate.
Must agree though, captias are more of a fad than a necessity. There must be something better we can come up with.
- Vid Hubble 'scope scans 200,000-ton CHUNKY CRUMBLE ENIGMA
- Bugger the jetpack, where's my 21st-century Psion?
- Google offers up its own Googlers in cloud channel chumship trawl
- Interview Global Warming IS REAL, argues sceptic mathematician - it just isn't THERMAGEDDON
- Apple to grieving sons: NO, you cannot have access to your dead mum's iPad