Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python selenium, find out when a download has completed?

I've used selenium to initiate a download. After the download is complete, certain actions need to be taken, is there any simple method to find out when a download has complete? (I am using the FireFox driver)

like image 797
applecider Avatar asked Dec 17 '15 15:12

applecider


People also ask

How does Selenium verify PDF download?

To handle a PDF document in Selenium test automation, we can use a java library called PDFBox. Apache PDFBox is an open-source library that exclusively helps in handling the PDF documents. We can use it to verify the text present in the document, extract a specific section of text or image in the documents, and so on.

How do you know if a file is completely downloaded in Java?

You should create a checksum (an MD5 Sum, or SHA1 Sum) for the file on the server. Then after the download, run a the same checksum and the two values need to match. If you are downloading via Java, you can use the MessageDigest class to help you generate the digest.


4 Answers

I came across this problem recently. I was downloading multiple files at once and had to build in a way to timeout if the downloads failed.

The code checks the filenames in some download directory every second and exits once they are complete or if it takes longer than 20 seconds to finish. The returned download time was used to check if the downloads were successful or if it timed out.

import time import os  def download_wait(path_to_downloads):     seconds = 0     dl_wait = True     while dl_wait and seconds < 20:         time.sleep(1)         dl_wait = False         for fname in os.listdir(path_to_downloads):             if fname.endswith('.crdownload'):                 dl_wait = True         seconds += 1     return seconds 

I believe that this only works with chrome files as they end with the .crdownload extension. There may be a similar way to check in other browsers.

Edit: I recently changed the way that I use this function for times that .crdownload does not appear as the extension. Essentially this just waits for the correct number of files as well.

def download_wait(directory, timeout, nfiles=None):     """     Wait for downloads to finish with a specified timeout.      Args     ----     directory : str         The path to the folder where the files will be downloaded.     timeout : int         How many seconds to wait until timing out.     nfiles : int, defaults to None         If provided, also wait for the expected number of files.      """     seconds = 0     dl_wait = True     while dl_wait and seconds < timeout:         time.sleep(1)         dl_wait = False         files = os.listdir(directory)         if nfiles and len(files) != nfiles:             dl_wait = True          for fname in files:             if fname.endswith('.crdownload'):                 dl_wait = True          seconds += 1     return seconds 
like image 185
Austin Mackillop Avatar answered Sep 25 '22 01:09

Austin Mackillop


There is no built-in to selenium way to wait for the download to be completed.


The general idea here would be to wait until a file would appear in your "Downloads" directory.

This might either be achieved by looping over and over again checking for file existence:

  • Check and wait until a file exists to read it

Or, by using things like watchdog to monitor a directory:

  • How to watch a directory for changes?
  • Monitoring contents of files/directories?
like image 33
alecxe Avatar answered Sep 23 '22 01:09

alecxe


import os
import time

def latest_download_file():
      path = r'Downloads folder file path'
      os.chdir(path)
      files = sorted(os.listdir(os.getcwd()), key=os.path.getmtime)
      newest = files[-1]

      return newest

fileends = "crdownload"
while "crdownload" == fileends:
    time.sleep(1)
    newest_file = latest_download_file()
    if "crdownload" in newest_file:
        fileends = "crdownload"
    else:
        fileends = "none"

This is a combination of a few solutions. I didn't like that I had to scan the entire downloads folder for a file ending in "crdownload". This code implements a function that pulls the newest file in downloads folder. Then it simply checks if that file is still being downloaded. Used it for a Selenium tool I am building worked very well.

like image 30
Red Avatar answered Sep 21 '22 01:09

Red


I know its too late for the answer, though would like to share a hack for future readers.

You can create a thread say thread1 from main thread and initiate your download here. Now, create some another thread, say thread2 and in there ,let it wait till thread1 completes using join() method.Now here,you can continue your flow of execution after download completes.

Still make sure you dont initiate your download using selenium, instead extract the link using selenium and use requests module to download.

Download using requests module

For eg:

def downloadit():
     #download code here    

def after_dwn():
     dwn_thread.join()           #waits till thread1 has completed executing
     #next chunk of code after download, goes here

dwn_thread = threading.Thread(target=downloadit)
dwn_thread.start()

metadata_thread = threading.Thread(target=after_dwn)
metadata_thread.start()
like image 28
Dhyey Shah Avatar answered Sep 21 '22 01:09

Dhyey Shah