Computer scientists have developed software that easily defeats audio CAPTCHAs offered on account registration pages of a half-dozen popular websites by exploiting inherent weaknesses in the automated tests designed to prevent fraud. Decaptcha is a two-phase audio-CAPTCHA solver that correctly breaks the puzzles with a 41- …
I love it! After years of Turing contests, no one could make a computer act like a human, but in a few years, with financial incentive, spammers can write programs to read text and hear words that I can't even figure out!
Money Talks ....
.... clearly and without semantic noise.
I was thinking that.
I would hate to be visually impaired as I can never make out those audio captchas either.
Irreparable implies that the fault cannot be repaired. Tweak the audio files to use the same semantic noise that Google uses and hey presto, the vulnerability is fixed. Seems repairable to me!
Theres a much MUCH better way...
...simply add a question/answer option to your forms. The more questions you have, the more secure it becomes. Its also unique to your own site so spammers wont bother trying to learn how to use it.
A few examples
What is <X> plus <Y>?
What is the first letter of this websites name?
If I have £1.50 and I buy a Mars Bar for 50p, how much money do I have left?
you get the idea. Its so simple and stupid that spam bots cant pick up the questions.
I've used this method for years and have never had a spammer get through.
Re: Theres a much MUCH better way...
But how does that work for the visually impaired, which is what the main article was discussing?
Do you have sound files that read out the questions?
Works if you ask in Finnish
One Finnish news site I frequent uses this idea when submitting reader comments. It presents a trivial arithmetic question ("what is two plus five?") as Finnish words. I have yet to see any comment spam there, even thought the "puzzle" has remained exactly the same for years. Probably having it in Finnish is enough to deter most spammers.
It would not work in English: you could simply use Wolfram Alpha to break it. Just tried asking it "what is two plus five" and the answer came back, in several representations of the number 7.
If only there was some way to synthesise human speech given the text form of the question. Alas, I'm sure such super technology is as far beyond us as magic such as fuel cells and super conductors.
If your preferred screen reader can't cope with text and forms you aren't going to have much luck using a web forum or similar anyway.
thought I had WolframAlpha stumped
by setting the question as half a bakers dozen - it got that
but it didn't like "a score of baker's dozens" which it turned into 1.538.
perhaps a strong regional accent in audio captchas.......
I saw one of those the other day
On a website that had gone for the belt-and-braces approach of a question plus an image captcha.
The question was "what color are clouds". To which "white", "grey", and "gray" (given "color") would all seem to be possible answers. I can't remember which I chose initially but it didn't like it....
Sure, but when I typed "Graham's number" into Wolfram Alpha, it didn't give me any representations of it - ergo it's just not that good.
What is the success rate for breaking "question captchas"? For example, "half of six times one". Are natural language processing techniques good enough to parse the meaning of such text?
(and if not, as VoodooTrucker pointed out, I'm sure the academics would appreciate some help from the spammers)
Try this link:
big f & deal
None of this answers the question of why I need to Sign Up in order to click the freaking Like / Dislike button in the first place.
You only broke the Audio CAPTCHA. Reporting this smells of wet blanket.
CAPTCHAS are getting too difficult to solve anyway, move on to riddles.
Get the buggers to crack the Voynich Manuscript - THEN they can post their amusing cat photos.
This is not new. Search Google and you'll find many examples of this being done for years. Why do you say it's irreparable? That's nonsense. They'll just improve the audio captchas, the same way as image captchas have improved over the years.
They don't need to make them 100% uncrackable. They just need to slow people down. At the end of the day, any serious spammer pays for captcha cracking now days. 1000 captchas for $1 is the going rate. Usually outsourced to the Philippines/India.
I'd like a copy please
I can't read these things half the time, a program to do it for me would be quite useful.
I see the problem.
Because of known research into noise reduction, we already have robust mechanisms for audio analysis. So computing capability runs headlong into legal obligations. Many sites MUST have audio CAPTCHAs to accommodate laws protecting the disabled, but this obligation itself becomes a tool that can be used by miscreants. It's like having a GPS that knows how to get you home...and then someone steals it.
I'm not too clear as to what kind of Semantic Noise Google uses, but I think what they're saying is that you can't use visual techniques in audio CAPTCHAs. You have to go about them from a completely different angle: instead of the CAPTCHA just saying a few words, have it speak a specific task to perform (and to defeat simple speech recognition, it cannot be a simple instruction; give an instruction that involves some thinking like, "Enter the sum of 2 and 8, as a word." or "Type, in order, all the vowels in the word 'deviate'."
Where can I buy it?
I can't decipher captchas, it'd be a really useful tool to have on the desktop.
Cool. Now, let's abolish TextCaptcha, and shoot whoever it was that invented it.
"Is it a \, or just a squinted I? Oh, I'll give it a try... arses. Now, this one, is it a t, or an l with a one of those little sideways dashes in a bad place? ...oh, ffs."
@ dave b 1
Or is that an "n"... no its an upside-down "u"
had one the other day
That had old-fashioned "S"s in it, the ones that look like fancy "f"s.
Perhaps it was to keep out young people......
you do realize...
That you're providing free decoding service for the google book scanning project. the secondary purpose is to prevent comment spam.
I used to do my part to get rid of the easy to decode captchas by spending 3 hours/night decoding them (yes, i was bored and suffering insomnia). Now, even I have a hard time deciphering them, since they appear to have run out of English words.
Spam account farms
Alas, you can't beat the sweatshops of the world, and people are supposed to beat your "hard for computer" tests (though in truth, sometimes I take a few goes myself at some of the mangled text tests these days!).
If only people didn't follow spam email / links and the market would dry up and die...
Use visual/audio illusions that affect only the human brain
It seems that captchas are trying to keep ahead of research into computer comprehension of text and spoken voice.
It would be cool if websites identified legitimate human beings using visual or auditory illusions which can be picked up by the human brain, but not easily deciphered by a computer. I'm not even sure if these exist -- certainly I imagine that using binaural beats would not work, since I suspect that the frequency of the beats is easy for a computer to calculate.
"It would be cool if websites identified legitimate human beings using..."
Sorry to pick on you, but no it wouldn't.
Even a 100% foolproof mechanism (no false positives and no false negatives) is doomed to failure because (as an earlier comment pointed out) the best scripts actually use human beings to break through the captchas. Captchas solve the wrong problem.
With the world as it is, you have an army of poor people willing to help the crooks spam the rich-but-stupid people. To truly solve the problem, you'd have to either solve world poverty or get rid of all the stupid people. Take your pick, and good luck with either, but don't hold your breath.
As I understand it, a bot scrapes a website, fills out whatever form it is trying to fill out, and passes the captcha to the human 'solvers.' They solve the captcha, and the bot takes the response and feed it back into the form.
However, if the mechanism related to something on the website itself (eg, "What animal is this website's mascot", "type the name of this website into this box", "What color is the background of the logo", "What font most resembles the website's font", and so on), that approach won't work. Taking the captcha out of the website will make the answer impossible to find.
Of course, chances are that the 'solvers' the crooks are using only speak one language, so using a captcha that uses natural language would defeat them handily enough in every language but their own. If you have a different captcha (or set thereof) for each language you support, and you notice one language group has a lot more bots making it through, you'll get a pretty good idea as to where the 'solvers' are from... dunno what you'd do with that information, but it would interesting data, anyway.
The botherders aren't THAT stupid. If the CAPTCHA requires context, they provide the mooks with the appropriate context (such as a picture of the site scrape). And as for the language barrier, they simply make sure their mooks are from certain countries or are of a certain level of language comprehension. A little more work, but nothing compared to the rewards.
Oh I know!
Hackers are using DOLBY technology to remove background noise! HAHAHAHAHAHA
Need an audio version of kittenauth
Maybe "Which one is a cat? Press 1 to 4." and play four animal sounds.
Or a "what animal is this?" type thingy.
Something that speech to text + search engine wont have a hope with.
That method would only have four guesses...
Instead, try: "type the word that was followed by a cat sound." That way, you could have a lot of captchas. Even "What animal was guessed correctly?" and play animal sounds and animal names together ("This is a dog, baaa, this is a cat, meow, this is a bird, growl")
Yep, sounds like a winner
Plus, it could be extended when they start profiling the sounds (like the staccato sound of the sheep) . A pneumatic road digger, put through the right filters, would be hard to tell apart from the bleat of a sheep for a computer.
Failing that, get someone with a strong accent to read out the words on the current captchs.
A human can (usually) work out what is being said.
When they start encoding speech to text to handle UK accents, it will be a bonus for anyone thats tried voice recognition booking systems.
I think it should be the other way around,
You see a word, you speak it and the server will analyze it if it was spoken by a human or speech synthesis computer.
I have never ever heard a computer voice that was as natural as a human
I noticed a stunning flaw in many CAPTCHA systems too...
Half of them are bloody unreadable. I know the idea is so computers can't read them and thus it prevents bots, but half the time I can't read them either. They're especially annoying when paired with websites that reset the password fields on failure and websites which don't check if a username is free before you hit the submit button....it's a barrel of laughs trying 7 captchas, finally getting one right and then being told your username choice is already taken...
What an odd conclusion
First they note that all current audio captchas _except_ Recaptcha can be defeated. They then go on to conclude: -
"As a result, we suspect that it may not be possible to design secure audio captchas that are usable by humans using current methods. It is therefore important to explore alternative approaches."
Excuse me? They've demonstrated that many sites use an easily defeated approach, when there's one available that's still undefeated. Isn't semantic noise a "current method"? What alternative approaches need to be explored exactly?
Riddles should be good
As I was going to St. Ives
I met a man with seven wives....
Any unique or novel system you employ for a smallish website will work simply because they won't bother putting in the man hours to create a bot just for your site.
When it comes to signing up for webmail/IM accounts ... the same tricks simply won't work. Create 1000 questions, someone will make a database of the 1000 matching answers, ask math questions and they will write code to parse the equations and solve them. Asking the user any multiple choice question (like which image is a cat) fails because they can simply guess and still get an economically viable success rate.
So if you are a little guy ... yeah you can come up with something unique and clever and be spam free, if you are Microsoft, Yahoo, Google ... there are no easy answers.
You don't use a database.
Or rather, you use the database to hold PIECES of your puzzle. To construct the actual puzzle, you take the pieces and mix them together. Then the number of combinations can add up dramatically. Add different rules for each possible phrase (such as switching between stating all the vowels to stating the sum to stating letters 5-7). The more arrangements you make, the trickier it becomes for a speech recognizer to pick out the task to do. You can also use phrases that can change depending on context ("recognize speech" vs. "wreck a nice beach") so can easily trip up speech recognition.
Years ago a blogger I read complained that his site was being botspammed by over a thousand stupid comments a day. He implemented a capcha. By day three the capcha was always the same and easily guessable even if the distortion was too bad to read the damned thing. Seems it wasn't worth the bots' time and they just moved on no matter what was used. His (s)hit rate dropped from 1K+ to ~3 per day.
I've always used a policy of moderating posts until the poster says something remarkably on-topic. This has successfully blocked bots, idiots and boring people from ruining the peace and quiet, but I can see the approach wouldn't scale to TwitFace levels.
Google Semantic Captcha
I find the conclusion strange too, when the article obviously states that Google's reCAPTCHA system hasn't been defeated.
It's interesting because technically, even the system itself does not know the answer to one of the words presented! This is because Google is actually using us, the recaptcha users, as a way to recognize words from old books and print that have failed OCR. See:
I just posted the same thing as a reply to a message above... I guess I should have read all of the posts before posting myself...
Gotta love wolfram alpha...
Up until now I had no idea how much wood a woodchuck could chuck...
Anyway, back to the topic. It's amazing that there are actually still sites out on the net that don't implement any form of captcha. I remember once suggesting it in an email to gumtree, and the reply I received was so snotty that I never visited the site again. Hopefully the moron who replied to me has been sacked for incompetence by now, and they have implemented something, but I won't hold my breath.
- Geek's Guide to Britain INSIDE GCHQ: Welcome to Cheltenham's cottage industry
- 'Catastrophic failure' of 3D-printed gun in Oz Police test
- Game Theory Is the next-gen console war already One?
- Analysis Spam and the Byzantine Empire: How Bitcoin tech REALLY works
- VIDEO Herschel Space Observatory spots galaxies merging