Is there a way to improve url download speed on Python?
I have a program that I wrote in VB6 that smokes Python without trying. I've converted the thing over and I'm trying it out right now and the things seems much slower in Python(linux), twice as long. Even the initial version of the program seemed like it was taking longer than what I was use to it taking on Windows.
I've tried using both urllib
(2.7), urllib.request
(3.3), and requests
. Currently I'm trying urllib3
and it isn't any faster either. What would normally take 45 minutes on Windows looks like it would take close to 2 hours on linux to accomplish the same task on the same computer on the same internet connection. The task is simply searching the internet and downloading files when the search finds what it is looking for...simply a span of potential file names.
I'll also ask since it has happened more than once so far this afternoon, how do I detect a 110 error code(connection timed out). What I'm using below doesn't work and it has still killed the program.
import urllib3
http = urllib3.PoolManager()
def dl_10(self):
self.NxtNum10 = int(self.HiStr10)
while self.NxtNum10 < int(self.HiStr10)+9999:
url = 'http://www.example.com/videos/encoded/'+str(self.NxtNum10)+'.mp4'
r = http.request('GET', url)
if r.status==404:
self.NxtNum10 +=1
continue
elif r.status==110:
continue
else:
urllib.request.urlretrieve(url,str(self.NxtNum10)+'_1.mp4')
statinfo = os.stat(str(self.NxtNum10)+'_1.mp4')
if statinfo.st_size<10000:
os.remove(str(self.NxtNum10)+'_1.mp4')
else:
self.End10 = self.NxtNum10
self.NxtNum10 +=1
self.counter +=1
self.NxtNum10 = 'FINISHED'
This is being run through threads, I wouldn't think that should make any difference. Like I said the initial write up using urllib(2.7) was slow as well and it wasn't using threads I was just running the program 10 times just like I always have on Windows.
Is there any faster way of grabbing stuff from the internet with Python?
In summary: code is slowed down by the compilation and interpretation that occurs during runtime. Compare this to a statically typed, compiled language which runs just the CPU instructions once compilated. It's actually possible to extend Python with compiled modules that are written in C.
I find that instead of using urlretrieve directly, using the following method is much more faster:
resp = urllib2.urlopen(url)
respHtml = resp.read()
binfile = open(filename, "wb")
binfile.write(respHtml)
binfile.close()
To write the file directly.Hope it can help
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With