Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selenium does not work with a chromedriver modified to avoid detection

I'm asking this because I'm aware of this thread and this thread, and the others about the same subject, but the solution everyone forwards in the first thread no longer works. So please do not mark this as closed because that first thread exists. The answer is from 2016 and you can see more recent comments having trouble.

I'm using Selenium to do some light web scraping. One site I'm interacting with is clearly detecting that my browser is automated (but curiously, only cares as long as I'm also accessing a version of the site outside my region, but that's neither here nor there).

The solution in the first thread suggests taking the chromedriver downloaded from here and modifying it. It says to get rid of mentions of variables with "$cdc$ in them. So I do the following. Download v2.41 from that site, unzip it. This version lets me use Chrome with Selenium via br = webdriver.Chrome('./chromedriver'), but has the automation detection problem. So, I cp this to make chromedriver-modified.

In chromedriver-modified, I open it with vim and search for $cdc. I find a similar (but slightly different) function from the one in the linked thread on line 1934 or so:

function getPageCache(opt_doc, opt_w3c) {
  var doc = opt_doc || document;
  var w3c = opt_w3c || false;
  // var key = '$cdc_asdjflasutopfhvcZLmcfl_';
  var key = 'xxxx_asdjflasutopfhvcZLmcfl_';
  // var key = 'randomblahhh_';
  if (w3c) {
    if (!(key in doc))
      doc[key] = new CacheWithUUID();
    return doc[key];
  } else {
    if (!(key in doc))
      doc[key] = new Cache();
    return doc[key];
  }
}

I've tried replacing this variable both with something random (the randomblahhh_ var) and something that just replaces the first 4 characters of the $cdc one, because I saw both suggested in the comments in that thread (I don't know if some format for the variable is important here.

Neither works. What I mean by that is that when I try to run it with chromedriver-modified, the webdriver won't even start:

>>> from selenium import webdriver
>>> br = webdriver.Chrome(executable_path='./chromedriver-modified')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/selenium/webdriver/chrome/webdriver.py", line 68, in __init__
    self.service.start()
  File "/usr/lib/python3/dist-packages/selenium/webdriver/common/service.py", line 96, in start
    self.assert_process_still_running()
  File "/usr/lib/python3/dist-packages/selenium/webdriver/common/service.py", line 109, in assert_process_still_running
    % (self.path, return_code)
selenium.common.exceptions.WebDriverException: Message: Service ./chromedriver-modified unexpectedly exited. Status code was: -11

I've had trouble Googling and figuring out what this status code means. In fact, I found this unanswered reddit thread with the same exact problem.

The first thread also mentions $wdc variables, but I find no mention of them in chromedriver.

Just to preempt the possible suggestion, too: I'm almost 100% confident that it's detecting that I'm using an automated browser because it's automated, not because of something like mouse click speed or anything. If I start the browser with selenium but then manually do the rest, it still causes the problem.

edit: I'm using Chrome v68 from the Ubuntu repos, google-chrome-stable. To be honest I don't need to use Chrome specifically, but the answers I've found seem to center around it rather than Firefox.

edit2: one last comment -- I noticed in the first linked thread that some people were "recompiling":

For me, I used chrome, so, all that I had to do was to ensure that $cdc_ didn't exist anymore as document variable, and voila (download chromedriver source code, modify chromedriver and re-compile $cdc_ under different name.)

I'm not sure what that means -- are they recompiling Chrome itself? All I've done is change the variable in the chromedriver file.

like image 221
GrundleMoof Avatar asked Aug 09 '18 15:08

GrundleMoof


People also ask

Can a website detect when you are using Selenium with ChromeDriver?

The answer is YES! Websites can detect the automation using JavaScript experimental technology navigator. webdriver in the navigator interface.

How do I make ChromeDriver undetectable?

In order to use ChromeDriver undetectable to Distil checkpoints (which are described nicely in this stackoverflow post), you will need to ensure that your browser does not contain any variable in its window or document prototypes that reveal that you are using a webdriver, as the one you mention.

Which version of Selenium is compatible with ChromeDriver?

ChromeDriver is only compatible with Chrome version 12.0. 712.0 or newer. If you need to test an older version of Chrome, use Selenium RC and a Selenium-backed WebDriver instance.


1 Answers

It is unnecessary to recompile it again unless you want to construct the individual chrome with certain features. Try to change the '$cdc_asdjflasutopfhvcZLmcfl_' into '$abc_asdjflasutopfhvcZLmcfl_'. Remember not to note this line or change it into other variable name which has different length. As the compiled file is sensitive to this, which may lead to the running error.

like image 145
Kwoks Avatar answered Sep 28 '22 13:09

Kwoks