This post refers to Google ReCaptcha v2 (not the latest version)
Recently Google introduced a simplified "captcha" verification system (video) that enables users to pass the "captcha" just by clicking on it.
But how can it differentiate a bot from a person just by a click?
As per this answer, (assuming a similar implementation), at first "recaptcha" generates a hidden key and attaches it to a hidden input element and also lazily renders a check box (not an actual check box input
but a div
) with the same key which when clicked, sends an asynchronous request (XHR) to the Google backend servers to mark it as a valid verification key (i.e. a key that has to be validated when the form is submitted).
But why can't bots automate that click (at least, browser-based bots)?
How might this work?
reCAPTCHA v2 (Invisible reCAPTCHA badge) The invisible reCAPTCHA badge does not require the user to click on a checkbox, instead it is invoked directly when the user clicks on an existing button on your site or can be invoked via a JavaScript API call.
reCAPTCHA is a free service that protects your website from spam and abuse. reCAPTCHA uses an advanced risk analysis engine and adaptive CAPTCHAs to keep automated software from engaging in abusive activities on your site. It does this while letting your valid users pass through with ease.
Is reCAPTCHA v3 better than v2? Neither of them is good at blocking bots. While reCAPTCHA v3 is less intrusive than v2 for a user, it places a significant burden on the webmaster to determine when to let users through and when to block or challenge them. There's no right answer to this.
Google ReCaptcha uses cookies to track the user's behaviour across your website. In practice this means you must display a notice that gives users the chance to accept cookies or not. You cannot load cookies before consent has been given.
This is speculation, but based on Google's reference to the "risk analysis engine" they use (http://googleonlinesecurity.blogspot.com/2014/12/are-you-robot-introducing-no-captcha.html)
I would assume it looks at how you behaved prior to clicking, how your cursor moved on its way to the check (organic path/acceleration), which part of the checkbox was clicked (random places, or dead on center every time), browser fingerprint, Google cookies & contents, click location history tied to your fingerprint or account if it detects one etc.
It's fairly difficult to fake "organic" behavior in such a way that it would fool a continuously learning pattern detection engine. In the cases where it's not sure, it still prompts you to match an actual CAPTCHA string.
A new paper has been released with several tests against reCAPTCHA:
https://www.blackhat.com/docs/asia-16/materials/asia-16-Sivakorn-Im-Not-a-Human-Breaking-the-Google-reCAPTCHA-wp.pdf
Some highlights:
Google has already fixed the cookie vulnerability and is probably restricting some behaviors based on IPs.
Another interesting finding is that Google runs a VM in JavaScript that obfuscates much of reCAPTCHA code and behavior. This VM is known as botguard and is used to protect other services besides reCAPTCHA:
https://github.com/neuroradiology/InsideReCaptcha
UPDATE 2017
A recent paper (from August) was published on WOOT 2017 achieving 85% accuracy in solving noCAPTCHA reCAPTCHA audio challenges:
http://uncaptcha.cs.umd.edu/papers/uncaptcha_woot17.pdf
UPDATE 2018
Google is introducing reCAPTCHA v3, which looks like a "human score prediction engine" that is calibrated per website. It can be installed into different pages of a website (working like a Google Analytics script) to help reCAPTCHA and the website owner to understand the behaviour of humans vs. bots before filling a reCAPTCHA.
https://www.google.com/recaptcha/intro/v3beta.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With