Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download files via selenium headless chrome on python

So the issue of downloading files via headless chrome with selenium still seems to be a problem as it was asked here with no answer over a month ago. but I don't understand how they are implementing the js which is in the bug thread. Is there an option I can add or a current fix for this? The original bug page located here All of my stuff is up to date as of today 10/22/17

In python:

from selenium import webdriver


options = webdriver.ChromeOptions()

prefs = {"download.default_directory": "C:/Stuff", 
         "download.prompt_for_download": False,
         "download.directory_upgrade": True, 
         "plugins.always_open_pdf_externally": True
         }

options.add_experimental_option("prefs", prefs)
options.add_argument('headless')
driver = webdriver.Chrome(r'C:/Users/aaron/chromedriver.exe', chrome_options = options)

# test file to download which doesn't work
driver.get('http://ipv4.download.thinkbroadband.com/5MB.zip')

If the headless option is removed this works no problem.

The actual files I'm attempting to download are PDFs located at .aspx URLs. I'm downloading them by doing a .click() and it works great except not with the headless version. The hrefs are javascript do_postback scripts.

like image 428
Aaron Conway Avatar asked Dec 31 '25 10:12

Aaron Conway


2 Answers

Why don't you locate the anchor href and then use get request to download the file. This way it will work in headless mode and will be much faster. I have done that in C#.

def download_file(url):
    local_filename = url.split('/')[-1]
    # NOTE the stream=True parameter
    r = requests.get(url, stream=True)
    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
                #f.flush() commented by recommendation from J.F.Sebastian
    return local_filename
like image 70
Anton Angelov Avatar answered Jan 02 '26 23:01

Anton Angelov


I believe now that Chromium supports this feature (as you linked to the bug ticket), it falls to the chromedriver team to add support for the feature. There is an open ticket here, but it does not appear to have a high priority at the moment. Please, everyone who needs this feature, go give it a +1!

like image 31
jlarkins Avatar answered Jan 03 '26 01:01

jlarkins



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!