I have a task to download Gbs of data from a website. The data is in form of .gz files, each file being 45mb in size.
The easy way to get the files is use "wget -r -np -A files url". This will donwload data in a recursive format and mirrors the website. The donwload rate is very high 4mb/sec.
But, just to play around I was also using python to build my urlparser.
Downloading via Python's urlretrieve is damm slow, possible 4 times as slow as wget. The download rate is 500kb/sec. I use HTMLParser for parsing the href tags.
I am not sure why is this happening. Are there any settings for this.
Thanks
urllib works for me as fast as wget. try this code. it shows the progress in percentage just as wget.
import sys, urllib
def reporthook(a,b,c):
# ',' at the end of the line is important!
print "% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c),
#you can also use sys.stdout.write
#sys.stdout.write("\r% 3.1f%% of %d bytes"
# % (min(100, float(a * b) / c * 100), c)
sys.stdout.flush()
for url in sys.argv[1:]:
i = url.rfind('/')
file = url[i+1:]
print url, "->", file
urllib.urlretrieve(url, file, reporthook)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With