Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get size of a file before downloading in Python

Tags:

python

urllib

I'm downloading an entire directory from a web server. It works OK, but I can't figure how to get the file size before download to compare if it was updated on the server or not. Can this be done as if I was downloading the file from a FTP server?

import urllib import re  url = "http://www.someurl.com"  # Download the page locally f = urllib.urlopen(url) html = f.read() f.close()  f = open ("temp.htm", "w") f.write (html) f.close()  # List only the .TXT / .ZIP files fnames = re.findall('^.*<a href="(\w+(?:\.txt|.zip)?)".*$', html, re.MULTILINE)  for fname in fnames:     print fname, "..."      f = urllib.urlopen(url + "/" + fname)      #### Here I want to check the filesize to download or not ####      file = f.read()     f.close()      f = open (fname, "w")     f.write (file)     f.close() 

@Jon: thank for your quick answer. It works, but the filesize on the web server is slightly less than the filesize of the downloaded file.

Examples:

Local Size  Server Size  2.223.533  2.115.516    664.603    662.121 

It has anything to do with the CR/LF conversion?

like image 324
PabloG Avatar asked Aug 08 '08 13:08

PabloG


People also ask

How do I get the size of a file in Python?

Use the os. path. getsize('file_path') function to check the file size. Pass the file name or file path to this function as an argument.

How do I find the size of a download file?

you can get a header called Content-Length form the HTTP Response object that you get, this will give you the length of the file. you should note though, that some servers don't return that information, and the only way to know the actual size is to read everything from the response.

Which function is used to get the size of a file?

Explanation: The function filesize() returns the size of the specified file and it returns the file size in bytes on success or FALSE on failure.


1 Answers

I have reproduced what you are seeing:

import urllib, os link = "http://python.org" print "opening url:", link site = urllib.urlopen(link) meta = site.info() print "Content-Length:", meta.getheaders("Content-Length")[0]  f = open("out.txt", "r") print "File on disk:",len(f.read()) f.close()   f = open("out.txt", "w") f.write(site.read()) site.close() f.close()  f = open("out.txt", "r") print "File on disk after download:",len(f.read()) f.close()  print "os.stat().st_size returns:", os.stat("out.txt").st_size 

Outputs this:

opening url: http://python.org Content-Length: 16535 File on disk: 16535 File on disk after download: 16535 os.stat().st_size returns: 16861 

What am I doing wrong here? Is os.stat().st_size not returning the correct size?


Edit: OK, I figured out what the problem was:

import urllib, os link = "http://python.org" print "opening url:", link site = urllib.urlopen(link) meta = site.info() print "Content-Length:", meta.getheaders("Content-Length")[0]  f = open("out.txt", "rb") print "File on disk:",len(f.read()) f.close()   f = open("out.txt", "wb") f.write(site.read()) site.close() f.close()  f = open("out.txt", "rb") print "File on disk after download:",len(f.read()) f.close()  print "os.stat().st_size returns:", os.stat("out.txt").st_size 

this outputs:

$ python test.py opening url: http://python.org Content-Length: 16535 File on disk: 16535 File on disk after download: 16535 os.stat().st_size returns: 16535 

Make sure you are opening both files for binary read/write.

// open for binary write open(filename, "wb") // open for binary read open(filename, "rb") 
like image 74
Jonathan Works Avatar answered Oct 15 '22 06:10

Jonathan Works