Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python selenium headless chromedriver not loading full page when it was working the day before with no changes to the code

I am using Selenium on python 3.7.2 to scrape from 9gag for a school project.

I am running chrome 80.0.3987.122 on MacOS. My chromedriver version is the one offered for version 80. The below code is how I use my driver:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options as c_opt

options = c_opt()
options.headless = True
driver = webdriver.Chrome(executable_path=PATH_TO_DRIVER, chrome_options=options)
driver.get('https://www.9gag.com'))

with open('source.html', 'w') as f:
    f.write(driver.page_source)

everything worked fine yesterday. i would run this code and open the source file and see the first couple of 9gag articles. Starting this morning my source result shows a loading graphic, as if it did not finish loading the javascript.

I know this is not an issue with the website since I tried this again with a headless firefox driver and a non-headless chrome driver and everything worked as expected.

The driver does not show any errors as far as I can tell.

My number one suspect is chrome. I think maybe it was updated somehow and selenium or the driver don't know how to handle it. I really need to use headless since without it I am forced to focus on the chrome window (this may be a mac issue, but still).

Has anyone encountered this behavior?


UPDATE

I see that my issue happens only when i visit specific categories, for example https://9gag.com/funny. so i saved the output from there and loaded it on chrome and got the following:click for image

It seems that headless chrome is falling into a captcha and cannot proceed to load the page. How is it possible that this just started happening now and is there something that can be done? how can we explain that geckodriver for firefox somehow overcomes this (it has its own issues, but at least it loads the page)?

like image 914
BelgishChoko Avatar asked Feb 26 '20 13:02

BelgishChoko


1 Answers

You can try adding these 2 flags to your options. The first one will make it so the "navigator.webdriver=true" variable in javascript doesn't show. Sites can access that variable to check if your using automation and block you or make you solve a captcha.

The next one is a user agent. Go ahead and set that to something that looks legit.

options.add_argument('disable-blink-features=AutomationControlled')
options.add_argument('user-agent=Type user agent here')

Hopefully this helps.

like image 188
010011100101 Avatar answered Nov 15 '22 09:11

010011100101