I'm downloading an entire directory from a web server. It works OK, but I can't figure how to get the file size before download to compare if it was updated on the server or not. Can this be done as if I was downloading the file from a FTP server?
import urllib import re url = "http://www.someurl.com" # Download the page locally f = urllib.urlopen(url) html = f.read() f.close() f = open ("temp.htm", "w") f.write (html) f.close() # List only the .TXT / .ZIP files fnames = re.findall('^.*<a href="(\w+(?:\.txt|.zip)?)".*$', html, re.MULTILINE) for fname in fnames: print fname, "..." f = urllib.urlopen(url + "/" + fname) #### Here I want to check the filesize to download or not #### file = f.read() f.close() f = open (fname, "w") f.write (file) f.close()
@Jon: thank for your quick answer. It works, but the filesize on the web server is slightly less than the filesize of the downloaded file.
Examples:
Local Size Server Size 2.223.533 2.115.516 664.603 662.121
It has anything to do with the CR/LF conversion?
Use the os. path. getsize('file_path') function to check the file size. Pass the file name or file path to this function as an argument.
you can get a header called Content-Length form the HTTP Response object that you get, this will give you the length of the file. you should note though, that some servers don't return that information, and the only way to know the actual size is to read everything from the response.
Explanation: The function filesize() returns the size of the specified file and it returns the file size in bytes on success or FALSE on failure.
I have reproduced what you are seeing:
import urllib, os link = "http://python.org" print "opening url:", link site = urllib.urlopen(link) meta = site.info() print "Content-Length:", meta.getheaders("Content-Length")[0] f = open("out.txt", "r") print "File on disk:",len(f.read()) f.close() f = open("out.txt", "w") f.write(site.read()) site.close() f.close() f = open("out.txt", "r") print "File on disk after download:",len(f.read()) f.close() print "os.stat().st_size returns:", os.stat("out.txt").st_size
Outputs this:
opening url: http://python.org Content-Length: 16535 File on disk: 16535 File on disk after download: 16535 os.stat().st_size returns: 16861
What am I doing wrong here? Is os.stat().st_size not returning the correct size?
Edit: OK, I figured out what the problem was:
import urllib, os link = "http://python.org" print "opening url:", link site = urllib.urlopen(link) meta = site.info() print "Content-Length:", meta.getheaders("Content-Length")[0] f = open("out.txt", "rb") print "File on disk:",len(f.read()) f.close() f = open("out.txt", "wb") f.write(site.read()) site.close() f.close() f = open("out.txt", "rb") print "File on disk after download:",len(f.read()) f.close() print "os.stat().st_size returns:", os.stat("out.txt").st_size
this outputs:
$ python test.py opening url: http://python.org Content-Length: 16535 File on disk: 16535 File on disk after download: 16535 os.stat().st_size returns: 16535
Make sure you are opening both files for binary read/write.
// open for binary write open(filename, "wb") // open for binary read open(filename, "rb")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With