Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Define download directory for chromedriver selenium with python

Everything is in the title!

Is there a way to define the download directory for selenium-chromedriver used with python?

In spite of many research, I haven't found something conclusive... As a newbie, I've seen many things about "the desired_capabilities" or "the options" for Chromedriver but nothing has resolved my problem... (and I still don't know if it will!)

To explain a little bit more my issue: I have a lot of url to scan (200 000) and for each url a file to download. I have to create a table with the url, the information i scrapped on it, AND the name of the file I've just downloaded for each webpage. With the volume I have to treat, I've created threads that open multiple instance of chromedriver to speed up the treatment. The problem is that every downloaded file arrives in the same default directory and I'm no more able to link a file to an url... So, the idea is to create a download directory for every thread to manage them one by one.

If someone have the answer to my question in the title OR a workaround to identify the file downloaded and link it with the current url, I will be grateful!

like image 831
matlabat Avatar asked May 02 '13 01:05

matlabat


2 Answers

For chromedriver1 create a new profile, and inside that profile set download.default_directory to the desired location, and set this profile for chrome using chrome.profile. The selenium-chromedriver package should have some methods for creating new profiles (at least it does with ruby), as they need some special handling.

Chromedriver2 doesn't support setting the profile. You can set preferences with it. If you want to set the download directory this is how you do it:

prefs: { download: { default_directory: "/tmp" } }

The ruby selenium-webdriver doesn't support this feature yet, the python variant might do however.

like image 123
SztupY Avatar answered Sep 19 '22 12:09

SztupY


I have faced recently the same issue. Tried a lot of solutions found in the Internet, no one helped. So finally I came to this:

  • Launch chrome with empty user-data-dir (in /tmp folder) to let chrome initialize it
  • Quit chrome
  • Modify Default/Preferences in newly created user-data-dir, add those fields to the root object (just an example):

    "download": { "default_directory": "/tmp/tmpX7EADC.downloads", "directory_upgrade": true }

  • Launch chrome again with the same user-data-dir

Now it works just fine.

Another tip: If you don't know file name of file that is going to be downloaded, create snapshot (list of files) of downloads directory, then download the file and find its name by comparin snapshot and current list of files in the downloads directory.

like image 22
Dmitry Nedbaylo Avatar answered Sep 19 '22 12:09

Dmitry Nedbaylo