Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Google reCAPTCHA v2 work behind the scenes?

This post refers to Google ReCaptcha v2 (not the latest version)

Recently Google introduced a simplified "captcha" verification system (video) that enables users to pass the "captcha" just by clicking on it.

But how can it differentiate a bot from a person just by a click?

As per this answer, (assuming a similar implementation), at first "recaptcha" generates a hidden key and attaches it to a hidden input element and also lazily renders a check box (not an actual check box input but a div) with the same key which when clicked, sends an asynchronous request (XHR) to the Google backend servers to mark it as a valid verification key (i.e. a key that has to be validated when the form is submitted).

But why can't bots automate that click (at least, browser-based bots)?

How might this work?

like image 761
everlasto Avatar asked Dec 04 '14 04:12

everlasto


People also ask

How does reCAPTCHA v2 invisible work?

reCAPTCHA v2 (Invisible reCAPTCHA badge) The invisible reCAPTCHA badge does not require the user to click on a checkbox, instead it is invoked directly when the user clicks on an existing button on your site or can be invoked via a JavaScript API call.

How does Google reCAPTCHA work?

reCAPTCHA is a free service that protects your website from spam and abuse. reCAPTCHA uses an advanced risk analysis engine and adaptive CAPTCHAs to keep automated software from engaging in abusive activities on your site. It does this while letting your valid users pass through with ease.

Is reCAPTCHA v2 better than v3?

Is reCAPTCHA v3 better than v2? Neither of them is good at blocking bots. While reCAPTCHA v3 is less intrusive than v2 for a user, it places a significant burden on the webmaster to determine when to let users through and when to block or challenge them. There's no right answer to this.

Does reCAPTCHA v2 use cookies?

Google ReCaptcha uses cookies to track the user's behaviour across your website. In practice this means you must display a notice that gives users the chance to accept cookies or not. You cannot load cookies before consent has been given.


2 Answers

This is speculation, but based on Google's reference to the "risk analysis engine" they use (http://googleonlinesecurity.blogspot.com/2014/12/are-you-robot-introducing-no-captcha.html)

I would assume it looks at how you behaved prior to clicking, how your cursor moved on its way to the check (organic path/acceleration), which part of the checkbox was clicked (random places, or dead on center every time), browser fingerprint, Google cookies & contents, click location history tied to your fingerprint or account if it detects one etc.

It's fairly difficult to fake "organic" behavior in such a way that it would fool a continuously learning pattern detection engine. In the cases where it's not sure, it still prompts you to match an actual CAPTCHA string.

like image 121
AgmLauncher Avatar answered Oct 14 '22 12:10

AgmLauncher


A new paper has been released with several tests against reCAPTCHA:

https://www.blackhat.com/docs/asia-16/materials/asia-16-Sivakorn-Im-Not-a-Human-Breaking-the-Google-reCAPTCHA-wp.pdf

Some highlights:

  • By keeping a cookie active for +9 days (by browsing sites with Google resources), you can then pass reCAPTCHA by only clicking the checkbox;
  • There are no restrictions based on requests per IP;
  • The browser's user agent must be real, and Google run tests against your environment to ensure it matches the user agent;
  • Google tests if the browser can render a Canvas;
  • Screen resolution and mouse events don't affect the results;

Google has already fixed the cookie vulnerability and is probably restricting some behaviors based on IPs.

Another interesting finding is that Google runs a VM in JavaScript that obfuscates much of reCAPTCHA code and behavior. This VM is known as botguard and is used to protect other services besides reCAPTCHA:

https://github.com/neuroradiology/InsideReCaptcha

UPDATE 2017

A recent paper (from August) was published on WOOT 2017 achieving 85% accuracy in solving noCAPTCHA reCAPTCHA audio challenges:

http://uncaptcha.cs.umd.edu/papers/uncaptcha_woot17.pdf

UPDATE 2018

Google is introducing reCAPTCHA v3, which looks like a "human score prediction engine" that is calibrated per website. It can be installed into different pages of a website (working like a Google Analytics script) to help reCAPTCHA and the website owner to understand the behaviour of humans vs. bots before filling a reCAPTCHA.

https://www.google.com/recaptcha/intro/v3beta.html

like image 20
barbolo Avatar answered Oct 14 '22 10:10

barbolo