python downloading is extremely slow

Tags:

python

Is there a way to improve url download speed on Python?

I have a program that I wrote in VB6 that smokes Python without trying. I've converted the thing over and I'm trying it out right now and the things seems much slower in Python(linux), twice as long. Even the initial version of the program seemed like it was taking longer than what I was use to it taking on Windows.

I've tried using both urllib(2.7), urllib.request(3.3), and requests. Currently I'm trying urllib3 and it isn't any faster either. What would normally take 45 minutes on Windows looks like it would take close to 2 hours on linux to accomplish the same task on the same computer on the same internet connection. The task is simply searching the internet and downloading files when the search finds what it is looking for...simply a span of potential file names.

I'll also ask since it has happened more than once so far this afternoon, how do I detect a 110 error code(connection timed out). What I'm using below doesn't work and it has still killed the program.

import urllib3

http = urllib3.PoolManager()

def dl_10(self):
        self.NxtNum10 = int(self.HiStr10)
        while self.NxtNum10 < int(self.HiStr10)+9999:
                url = 'http://www.example.com/videos/encoded/'+str(self.NxtNum10)+'.mp4'
                r = http.request('GET', url)
                if r.status==404:
                        self.NxtNum10 +=1
                        continue
                elif r.status==110:
                        continue
                else:
                        urllib.request.urlretrieve(url,str(self.NxtNum10)+'_1.mp4')
                        statinfo = os.stat(str(self.NxtNum10)+'_1.mp4')
                        if statinfo.st_size<10000:
                                os.remove(str(self.NxtNum10)+'_1.mp4')
                        else:
                                self.End10 = self.NxtNum10
                self.NxtNum10 +=1

        self.counter +=1
        self.NxtNum10 = 'FINISHED'

This is being run through threads, I wouldn't think that should make any difference. Like I said the initial write up using urllib(2.7) was slow as well and it wasn't using threads I was just running the program 10 times just like I always have on Windows.

Is there any faster way of grabbing stuff from the internet with Python?

603

asked Jan 17 '14 21:01

confused

1 Answers

I find that instead of using urlretrieve directly, using the following method is much more faster:

resp = urllib2.urlopen(url)
respHtml = resp.read()
binfile = open(filename, "wb")
binfile.write(respHtml)
binfile.close()

To write the file directly.Hope it can help

answered Sep 28 '22 10:09

郑穗展

Related questions
                            
                                HTTPS server with Python
                            
                                uWSGI runs wrong version of Python
                            
                                Error when plotting DataFrame containing NaN with Pandas 0.12.0 and Matplotlib 1.3.1 on Python 3.3.2
                            
                                Extend line to smoothly connect with another line
                            
                                Groupby - taking last element - how do I keep nan's?
                            
                                What Pandas data type is passed to transform or apply in a groupby
                            
                                Find a string and insert text after it in Python
                            
                                celery: "Substantial drift from"
                            
                                Which is faster: x*x or x**2?
                            
                                How do I use variables in a loop with range()? (Python)
                            
                                Python-Predicting/Extrapolating future data given a data set
                            
                                Django-CMS template blocks
                            
                                pandas binning a list based on qcut of another list
                            
                                Can't sign cloudfront URLs using boto
                            
                                How to integrate flask and flask_sockets into a single app running under uwsgi
                            
                                django-autocomplete-light default load a previously saved value?
                            
                                How to tell pylint that sub-classes of a composed class have access to the parent members?
                            
                                How to use __repr__ to create new object from it?
                            
                                How to kill a subprocess initiated by a different function in the same class
                            
                                WorkerLostError('Worker exited prematurely: signal 15 (SIGTERM).',)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With