Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python downloading is extremely slow

Tags:

python

Is there a way to improve url download speed on Python?

I have a program that I wrote in VB6 that smokes Python without trying. I've converted the thing over and I'm trying it out right now and the things seems much slower in Python(linux), twice as long. Even the initial version of the program seemed like it was taking longer than what I was use to it taking on Windows.

I've tried using both urllib(2.7), urllib.request(3.3), and requests. Currently I'm trying urllib3 and it isn't any faster either. What would normally take 45 minutes on Windows looks like it would take close to 2 hours on linux to accomplish the same task on the same computer on the same internet connection. The task is simply searching the internet and downloading files when the search finds what it is looking for...simply a span of potential file names.

I'll also ask since it has happened more than once so far this afternoon, how do I detect a 110 error code(connection timed out). What I'm using below doesn't work and it has still killed the program.

import urllib3

http = urllib3.PoolManager()

def dl_10(self):
        self.NxtNum10 = int(self.HiStr10)
        while self.NxtNum10 < int(self.HiStr10)+9999:
                url = 'http://www.example.com/videos/encoded/'+str(self.NxtNum10)+'.mp4'
                r = http.request('GET', url)
                if r.status==404:
                        self.NxtNum10 +=1
                        continue
                elif r.status==110:
                        continue
                else:
                        urllib.request.urlretrieve(url,str(self.NxtNum10)+'_1.mp4')
                        statinfo = os.stat(str(self.NxtNum10)+'_1.mp4')
                        if statinfo.st_size<10000:
                                os.remove(str(self.NxtNum10)+'_1.mp4')
                        else:
                                self.End10 = self.NxtNum10
                self.NxtNum10 +=1

        self.counter +=1
        self.NxtNum10 = 'FINISHED'

This is being run through threads, I wouldn't think that should make any difference. Like I said the initial write up using urllib(2.7) was slow as well and it wasn't using threads I was just running the program 10 times just like I always have on Windows.

Is there any faster way of grabbing stuff from the internet with Python?

like image 603
confused Avatar asked Jan 17 '14 21:01

confused


People also ask

Why is my python program so slow?

In summary: code is slowed down by the compilation and interpretation that occurs during runtime. Compare this to a statically typed, compiled language which runs just the CPU instructions once compilated. It's actually possible to extend Python with compiled modules that are written in C.


1 Answers

I find that instead of using urlretrieve directly, using the following method is much more faster:

resp = urllib2.urlopen(url)
respHtml = resp.read()
binfile = open(filename, "wb")
binfile.write(respHtml)
binfile.close()

To write the file directly.Hope it can help

like image 80
郑穗展 Avatar answered Sep 28 '22 10:09

郑穗展