I've written a script in python in combination with selenium to download few document files (ending with .doc) from a webpage. The reason I do not wish to use requests
or urllib
module to download the files is because the website I'm currently palying with do not have any true url connected to each file. They are javascript encrypted. However, I've chosen a link within my script to mimic the same.
What my script does at this moment:
(this is what I need rectified)
How can I modify my script to download the files initiating click on their links and put the downloaded files in their concerning folders?
This is my try so far:
import os
import time
from selenium import webdriver
link ='https://www.online-convert.com/file-format/doc'
dirf = os.path.expanduser('~')
desk_location = dirf + r'\Desktop\file_folder'
if not os.path.exists(desk_location):os.mkdir(desk_location)
def download_files():
driver.get(link)
for item in driver.find_elements_by_css_selector("a[href$='.doc']")[:2]:
filename = item.get_attribute("href").split("/")[-1]
#creating new folder in accordance with filename to store the downloaded file in thier concerning folder
folder_name = item.get_attribute("href").split("/")[-1].split(".")[0]
#set the new location of the folders to be created
new_location = os.path.join(desk_location,folder_name)
if not os.path.exists(new_location):os.mkdir(new_location)
#set the location of the folders the downloaded files will be within
file_location = os.path.join(new_location,filename)
item.click()
time_to_wait = 10
time_counter = 0
try:
while not os.path.exists(file_location):
time.sleep(1)
time_counter += 1
if time_counter > time_to_wait:break
except Exception:pass
if __name__ == '__main__':
chromeOptions = webdriver.ChromeOptions()
prefs = {'download.default_directory' : desk_location,
'profile.default_content_setting_values.automatic_downloads': 1
}
chromeOptions.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(chrome_options=chromeOptions)
download_files()
The following image represents how the downloaded files are currently stored (the files are outside of their concerning folders)
:
Some of the most common reasons that may trigger this issue and the Downloads folder may stop responding on your device include: Too many files are stored in the Downloads folder that your computer may not be able to process in the accumulative time. File Explorer issues or errors. Corrupt system files.
Fix Downloads Not Showing up on Windows 10 You can click "Show in folder" to check the accurate save location. To change the default storage location, go to "Settings" > "Downloads" > "Location" > click "Change" to complete. The approach is similar to change the location of files downloaded by other browsers.
Tap the menu on the left side and select "Settings." Navigate to "User Controls" and then again to "Content Filtering." A list of options will generate for downloads and you can select "Wi-Fi only" to save your mobile data and prevent automatic downloads and updates from running without a Wi-Fi connection.
I just added the the rename of the file to move it. So it'll work just as you have it, but then once it downloads the file, will move it to the correct path:
os.rename(desk_location + '\\' + filename, file_location)
Full Code:
import os
import time
from selenium import webdriver
link ='https://www.online-convert.com/file-format/doc'
dirf = os.path.expanduser('~')
desk_location = dirf + r'\Desktop\file_folder'
if not os.path.exists(desk_location):
os.mkdir(desk_location)
def download_files():
driver.get(link)
for item in driver.find_elements_by_css_selector("a[href$='.doc']")[:2]:
filename = item.get_attribute("href").split("/")[-1]
#creating new folder in accordance with filename to store the downloaded file in thier concerning folder
folder_name = item.get_attribute("href").split("/")[-1].split(".")[0]
#set the new location of the folders to be created
new_location = os.path.join(desk_location,folder_name)
if not os.path.exists(new_location):
os.mkdir(new_location)
#set the location of the folders the downloaded files will be within
file_location = os.path.join(new_location,filename)
item.click()
time_to_wait = 10
time_counter = 0
try:
while not os.path.exists(file_location):
time.sleep(1)
time_counter += 1
if time_counter > time_to_wait:break
os.rename(desk_location + '\\' + filename, file_location)
except Exception:pass
if __name__ == '__main__':
chromeOptions = webdriver.ChromeOptions()
prefs = {'download.default_directory' : desk_location,
'profile.default_content_setting_values.automatic_downloads': 1
}
chromeOptions.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(chrome_options=chromeOptions)
download_files()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With