For decades, web applications have suffered the consequences of malicious bot traffic. Bots can be coordinated to gather data from a site, abuse mechanics of a site, or even disrupt its services with a full-scale DDoS (Distributed Denial of Service) attack.
To combat the increasing threat of bots and enable bot management, experts developed CAPTCHA and reCAPTCHA tests, designed to tell humans apart from bots. For many years, they were gold standards in bot mitigation, but today, they are not as attractive as they once were.
What exactly are these methods, and what are their strengths and weaknesses?
What Is CAPTCHA?
CAPTCHA is a contrived acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart.” It’s a kind of reverse Turing test. In a Turing test, a human judge attempts to discern whether a subject is a computer or human; the computer “passes” the test if it deceives the human judge. With CAPTCHA, a computer attempts to discern whether a subject is a computer or human; the human “passes” the test if it completes a task successfully.
How It Works
Conventional CAPTCHA tests take the form of a distorted image of a word (or phrase) on the user’s screen. These phrases are randomly generated, and fully automated—requiring no manual intervention to deploy. The human visitor to a site must enter the words to solve the problem; if they enter the words successfully, they “pass.”
This test was originally almost impossible for bots to solve, effectively weeding them out. There are three major reasons why this was the case:
- Variable letter shapes and sizes. CAPTCHA words tend to be highly distorted, with letters being warped so they can’t be easily discerned by automated bots. However, to a human, it should be easy to identify a letter even if it’s somewhat distorted.
- Spacing and segmentation. Characters in CAPTCHA words tend to be crowded close together, making it hard for a bot to tell where one letter ends and another begins. Again, to a human, this task is trivial.
- Context and identification. Human beings are good at identifying words as a whole, rather than identifying each individual letter like a bot would. However, a bot may struggle to read a word without seeing each letter clearly.
Over the years, CAPTCHA has introduced more advanced puzzles, including general knowledge questions and notably, visual identification puzzles. For example, a human user may be asked to identify images that contain a specific object.
What Is reCAPTCHA?
reCAPTCHA is a network of human verification systems similar in nature to CAPTCHA, owned by Google. Initially, this system was designed with CAPTCHA-like puzzles, but with a twist—the puzzle words were scanned from books that needed to be digitized, serving as a way to crowdsource illegible word analysis.
How It Works
For the most part, simple reCAPTCHA puzzles function like CAPTCHA puzzles. reCAPTCHA also provides image identification puzzles, prompting users to identify images that contain specific things, like street signs or fire hydrants; this is used for Street View analyses.
Why CAPTCHA and reCAPTCHA Are Increasingly Harmful to User Experience
CAPTCHA and reCAPTCHA may be somewhat useful at filtering bot traffic (though not universally so—more on that later), but they tend to be annoying for users. If used improperly or excessively, they can detract from the experience of users on your site.
- The extra step. Most people don’t want to have to solve a puzzle before completing a transaction or accessing content. Though a typical CAPTCHA puzzle only takes 10 seconds for a human to solve, in the fast-paced world of the web, that’s a relatively long time.
- Inscrutability. Most people wouldn’t mind solving a quick puzzle if it were truly quick. Unfortunately, the very mechanisms used to deceive bots can also make a CAPTCHA puzzle inscrutable to a human being. You may have encountered this in your own life; have you ever seen a CAPTCHA word that was practically impossible to decipher?
- Increasing difficulty. The difficulty of CAPTCHA and reCAPTCHA puzzles is also increasing. A decade ago, the most complex task you’d be asked to do is typing out a simple word. Today, you might spend a minute or more trying to identify Japanese street signs, wondering whether a tiny corner of a sign in an image means that the image “contains” a sign.
- Disabilities. CAPTCHAs are difficult enough for people with conventional vision and motor skills. For people with disabilities, they’re even harder. CAPTCHA puzzles could decrease the accessibility of your site.
Why CAPTCHA and reCAPTCHA Are Increasingly Ineffective
You might tolerate the negative user experience associated with CAPTCHA puzzles if they were guaranteed to catch all non-human or illegitimate traffic. The unfortunate reality is that they aren’t. In fact, they’re getting worse as bot detection methods, for several reasons:
- History and familiarity. CAPTCHA puzzles have been around since 2000, so hackers have had plenty of time to adapt to these puzzles. More advanced CAPTCHA puzzles introduce new, more challenging problems to solve, but they still follow many of the same principles.
- Machine learning and AI. We also need to consider the emergence of machine learning tools and artificial intelligence (AI). Today’s machine learning technology can easily be trained to solve CAPTCHAs and reCAPTCHAs with 70 percent accuracy or higher. In fact, CAPTCHA puzzles serve as great training fodder for these advanced systems. Given enough time, even a rudimentary AI can learn to beat the best CAPTCHA puzzles available.
- Sweatshops and manual labor. Some CAPTCHA puzzles can be beaten without the help of bots or machine learning; instead, the work of solving CAPTCHA puzzles is outsourced to cheap human labor. These practical sweatshops can have dozens of people staring at screens and solving puzzles all day to get around the traffic limitations and verification potential of the CAPTCHA system.