Cloudflare and Chromedriver - cloudflare distinguishes between chromedriver and genuine chrome?

Question

I would like to use chromedriver to scrape some stories from fanfiction.net. I try the following:

from selenium import webdriver
import time

path = 'D:\chromedriver\chromedriver.exe'

browser = webdriver.Chrome(path)
url1 = 'https://www.fanfiction.net/s/8832472'
url2 = 'https://www.fanfiction.net/s/5218118'

browser.get(url1)
time.sleep(5)
browser.get(url2)

The first link opens (sometimes I have to wait 5 seconds). When I want to load the second url, cloudflare intervens and wants me to solve captchas - which are not solvable, atleast cloudflare does not recognize this. This happens also, if I enter the links manually in chromedriver (so in the GUI). However, if I do the same things in normal chrome, everything works just as fine (I do not even get the waiting period on the first link) - even in private mode and all cookies deleted. I could reproduce this on several machines. Now my question: To my intuition, chromedriver was just the normal chrome browser which allowed to be controlled. What is the difference to normal chrome, how does Cloudflare distinguish both, and how can I mask my chromedriver as normal chrome? (I do not intend to load many pages in very short time, so it should not look like a bot). I hope my question is clear

undetected Selenium · Accepted Answer

This error message...

Checking your browser before accessing

...implies that the Cloudflare have detected your requests to the website as an automated bot and subsequently denying you the access to the application.

Solution

In these cases the a potential solution would be to use the undetected-chromedriver to initialize the Chrome Browsing Context.

undetected-chromedriver is an optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io. It automatically downloads the driver binary and patches it.

Code Block:

import undetected_chromedriver as uc
from selenium import webdriver
import time

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
driver = uc.Chrome(options=options)
url1 = 'https://www.fanfiction.net/s/8832472'
url2 = 'https://www.fanfiction.net/s/5218118'
driver.get(url1)
time.sleep(5)
driver.get(url2)

References

You can find a couple of relevant detailed discussions in:

Selenium app redirect to Cloudflare page when hosted on Heroku
How to bypass being rate limited ..HTML Error 1015 using Python

Cloudflare and Chromedriver - cloudflare distinguishes between chromedriver and genuine chrome?

Tags:

python-3.x

selenium

selenium-chromedriver

cloudflare

undetected-chromedriver

Tamar

1 Answers

Solution

References

undetected Selenium

Recent Activity

Donate For Us

Cloudflare and Chromedriver - cloudflare distinguishes between chromedriver and genuine chrome?

Tags:

python-3.x

selenium

selenium-chromedriver

cloudflare

undetected-chromedriver

Tamar

1 Answers

Solution

References

undetected Selenium

Related questions

Recent Activity

Donate For Us