I'm curious how reCAPTCHA v3 works. Specifically the browser fingerprinting. When I launch an instance of Chrome through Selenium/chromedriver and test against reCAPTCHA 3 (https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php) I always get a score of 0.1 when using Selenium/chromedriver. When using incognito with a normal instance, I get 0.3. I've beaten other detection systems by injecting JavaScript and modifying the web driver object and recompiling webdriver from source and modifying the <code>$cdc_</code> variables. I can see what looks like some obfuscated POST back to the server, so I'm going to start digging there. What might it be looking for to determine if I'm running Selenium/chromedriver?

<h3>reCaptcha</h3> Websites can easily detect the network traffic and identify your program as a BOT. Google have already released 5(five) reCAPTCHA to choose from when creating a new site. While four of them are active and reCAPTCHA v1 being shutdown. <hr> <h3>reCAPTCHA versions and types</h3> <ul> <li> reCAPTCHA v3 (verify requests with a score): reCAPTCHA v3 allows you to verify if an interaction is legitimate without any user interaction. It is a pure JavaScript API returning a score, giving you the ability to take action in the context of your site: for instance requiring additional factors of authentication, sending a post to moderation, or throttling bots that may be scraping content.</li> <li> reCAPTCHA v2 - "I'm not a robot" Checkbox: The "I'm not a robot" Checkbox requires the user to click a checkbox indicating the user is not a robot. This will either pass the user immediately (with No CAPTCHA) or challenge them to validate whether or not they are human. This is the simplest option to integrate with and only requires two lines of HTML to render the checkbox.</li> </ul> <img src="https://i.stack.imgur.com/Dqyp6.gif" alt="newCaptchaAnchor"> <ul> <li> reCAPTCHA v2 - Invisible reCAPTCHA badge: The invisible reCAPTCHA badge does not require the user to click on a checkbox, instead it is invoked directly when the user clicks on an existing button on your site or can be invoked via a JavaScript API call. The integration requires a JavaScript callback when reCAPTCHA verification is complete. By default only the most suspicious traffic will be prompted to solve a captcha. To alter this behavior edit your site security preference under advanced settings.</li> </ul> <img src="https://i.stack.imgur.com/ceROx.png" alt="reCaptcha_invisible_badge"> <ul> <li> reCAPTCHA v2 - Android: The reCAPTCHA Android library is part of the Google Play services SafetyNet APIs. This library provides native Android APIs that you can integrate directly into an app. You should set up Google Play services in your app and connect to the GoogleApiClient before invoking the reCAPTCHA API. This will either pass the user through immediately (without a CAPTCHA prompt) or challenge them to validate whether they are human. </li> <li> reCAPTCHA v1: reCAPTCHA v1 has been shut down since March 2018.</li> </ul> <hr> <h3>Solution</h3> However there are some generic approaches to avoid getting detected while web-scraping: <ul> <li>The first and foremost attribute a website can determine your script/program is through your monitor size. So it is recommended not to use the conventional Viewport.</li> <li>If you need to send multiple requests to a website keep on changing the User Agent on each request. Here you can find a detailed discussion on Way to change Google Chrome user agent in Selenium? </li> <li>To simulate human like behavior you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing <code>time.sleep(secs)</code>. Here you can find a detailed discussion on How to sleep webdriver in python for milliseconds </li> </ul> <hr> <h3>Outro</h3> Some food for thought: <ul> <li>Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection</li> <li>Unable to use Selenium to automate Chase site login</li> <li>Confidence Score of the request using reCAPTCHA v3 API</li> </ul>

Selenium and Puppeteer have some browser configurations that is different from a non-automated browser. Also, since some JavaScript functions are injected into browser to manipulate elements, you need to create some override to avoid detections. There are some good articles explaining some points about Selenium and Puppeteer detection while it runs on a site with detection mechanisms: Detecting Chrome headless, new techniques - You can use it to write defensive code for your bot. It is not possible to detect and block Google Chrome headless - it explains in a clear and sound way the differences that JavaScript code can detect between a browser launched by automated software and a real one, and also how to fake it. GitHub - headless-cat-n-mouse - Example using Puppeteer + Python to avoid detection

How does reCAPTCHA 3 know I'm using Selenium/chromedriver?

Tags:

selenium

web-scraping

selenium-chromedriver

recaptcha

recaptcha-v3

I'm curious how reCAPTCHA v3 works. Specifically the browser fingerprinting.

When I launch an instance of Chrome through Selenium/chromedriver and test against reCAPTCHA 3 (https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php) I always get a score of 0.1 when using Selenium/chromedriver.

When using incognito with a normal instance, I get 0.3.

I've beaten other detection systems by injecting JavaScript and modifying the web driver object and recompiling webdriver from source and modifying the $cdc_ variables.

I can see what looks like some obfuscated POST back to the server, so I'm going to start digging there.

What might it be looking for to determine if I'm running Selenium/chromedriver?

609

asked Apr 03 '19 18:04

Mr J

2 Answers

reCaptcha

Websites can easily detect the network traffic and identify your program as a BOT. Google have already released 5(five) reCAPTCHA to choose from when creating a new site. While four of them are active and reCAPTCHA v1 being shutdown.

reCAPTCHA versions and types

reCAPTCHA v3 (verify requests with a score): reCAPTCHA v3 allows you to verify if an interaction is legitimate without any user interaction. It is a pure JavaScript API returning a score, giving you the ability to take action in the context of your site: for instance requiring additional factors of authentication, sending a post to moderation, or throttling bots that may be scraping content.
reCAPTCHA v2 - "I'm not a robot" Checkbox: The "I'm not a robot" Checkbox requires the user to click a checkbox indicating the user is not a robot. This will either pass the user immediately (with No CAPTCHA) or challenge them to validate whether or not they are human. This is the simplest option to integrate with and only requires two lines of HTML to render the checkbox.

newCaptchaAnchor

reCAPTCHA v2 - Invisible reCAPTCHA badge: The invisible reCAPTCHA badge does not require the user to click on a checkbox, instead it is invoked directly when the user clicks on an existing button on your site or can be invoked via a JavaScript API call. The integration requires a JavaScript callback when reCAPTCHA verification is complete. By default only the most suspicious traffic will be prompted to solve a captcha. To alter this behavior edit your site security preference under advanced settings.

reCaptcha_invisible_badge

reCAPTCHA v2 - Android: The reCAPTCHA Android library is part of the Google Play services SafetyNet APIs. This library provides native Android APIs that you can integrate directly into an app. You should set up Google Play services in your app and connect to the GoogleApiClient before invoking the reCAPTCHA API. This will either pass the user through immediately (without a CAPTCHA prompt) or challenge them to validate whether they are human.
reCAPTCHA v1: reCAPTCHA v1 has been shut down since March 2018.

Solution

However there are some generic approaches to avoid getting detected while web-scraping:

The first and foremost attribute a website can determine your script/program is through your monitor size. So it is recommended not to use the conventional Viewport.
If you need to send multiple requests to a website keep on changing the User Agent on each request. Here you can find a detailed discussion on Way to change Google Chrome user agent in Selenium?
To simulate human like behavior you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing time.sleep(secs). Here you can find a detailed discussion on How to sleep webdriver in python for milliseconds

Outro

Some food for thought:

Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
Unable to use Selenium to automate Chase site login
Confidence Score of the request using reCAPTCHA v3 API

192

answered Oct 14 '22 17:10

undetected Selenium

Selenium and Puppeteer have some browser configurations that is different from a non-automated browser. Also, since some JavaScript functions are injected into browser to manipulate elements, you need to create some override to avoid detections.

There are some good articles explaining some points about Selenium and Puppeteer detection while it runs on a site with detection mechanisms:

Detecting Chrome headless, new techniques - You can use it to write defensive code for your bot.

It is not possible to detect and block Google Chrome headless - it explains in a clear and sound way the differences that JavaScript code can detect between a browser launched by automated software and a real one, and also how to fake it.

GitHub - headless-cat-n-mouse - Example using Puppeteer + Python to avoid detection

answered Oct 14 '22 19:10

Striter Alfa

Related questions
                            
                                Protractor error message "unsupported command-line flag" in Chrome?
                            
                                Mocking and Stubbing with protractor
                            
                                Selenium: How to Inject/execute a Javascript in to a Page before loading/executing any other scripts of the page?
                            
                                Android Web Scraping with a Headless Browser [closed]
                            
                                Selenium Google Login Block
                            
                                Failed to connect to binary FirefoxBinary with Selenium in Maven
                            
                                Is there a good IE-based Selenium IDE? [closed]
                            
                                Check if a WebElement is stale without handling an exception
                            
                                what's the relationship between Selenium RC and WebDriver?
                            
                                How to access Network panel on google chrome developer tools with selenium?
                            
                                Error: org.testng.TestNGException: Cannot find class in classpath: EmpClass
                            
                                Puppeteer: Get innerHTML
                            
                                Selenium Element not visible exception
                            
                                Selenium missing or invalid 'entry.level' Error
                            
                                How to test the response code with Capybara + Selenium
                            
                                How To Run Selenium With Chrome In Docker
                            
                                Selenium WebDriver.get(url) does not open the URL
                            
                                How to make Jenkins run Selenium WebDriver/TestNG/Java tests automatically on deploy and what has Maven to do with all this?
                            
                                In Java, best way to check if Selenium WebDriver has quit
                            
                                How to handle print dialog in Selenium?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With