Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chrome browser initiated through ChromeDriver gets detected

I am trying to use selenium chromedriver in python for the website www.mouser.co.uk. However, it is detected as bot from the first shot .

enter image description here

Does any one has an explanation for this ?. hereafter the code i am using :

options = Options()
options.add_argument("--start-maximized")
browser = webdriver.Chrome('chromedriver.exe',chrome_options=options)
wait = WebDriverWait(browser, 30)
browser.get('https://www.mouser.co.uk')
like image 282
lyesAlgerian Avatar asked Oct 16 '18 09:10

lyesAlgerian


2 Answers

I have tried to access the url https://www.mouser.co.uk/ with certain chrome.options but did get detected and was redirected to Pardon Our Interruption page.

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = Options()
    options.add_argument("start-maximized")
    options.add_argument("disable-infobars")
    options.add_argument("--disable-extensions")
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get("https://www.mouser.co.uk")
    myElement = WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, "//a[@id='1_lnkLeftFlag']")))
    driver.execute_script("arguments[0].click();", myElement)
    

Now on inspecting the Pardon Our Interruption page you will find the <body> tag contains:

  • The class attribute dist-GlobalHeader
  • The class attribute dist-PageWrap

Which is a clear indication that the website is protected by Bot Management service provider Distil Networks and the navigation by ChromeDriver gets detected and subsequently blocked.


Distil

As per the article There Really Is Something About Distil.it...:

Distil protects sites against automatic content scraping bots by observing site behavior and identifying patterns peculiar to scrapers. When Distil identifies a malicious bot on one site, it creates a blacklisted behavioral profile that is deployed to all its customers. Something like a bot firewall, Distil detects patterns and reacts.

Further,

"One pattern with Selenium was automating the theft of Web content", Distil CEO Rami Essaid said in an interview last week. "Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".


Reference

You can find a couple of detailed discussion in:

  • Distil detects WebDriver driven Chrome Browsing Context
  • Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
  • Akamai Bot Manager detects WebDriver driven Chrome Browsing Context
like image 165
undetected Selenium Avatar answered Nov 15 '22 00:11

undetected Selenium


Tried everything suggested here, nothing worked. Only this module worked for me:

https://github.com/ultrafunkamsterdam/undetected-chromedriver

I used it to get a website which had a bot detection. I tried to use all suggested methods in previous answers but without success. The use of this module is very straightforward and is described in the git repo itself.

Side note: moderators have deleted previous editions of this post several times, without a good reason IMHO. I hope this edit will get through. Good luck.

like image 42
Shirkan Avatar answered Nov 15 '22 00:11

Shirkan