Python 3 urllib Vs requests performance

Tags:

I'm using python 3.5 and I'm checking the performance of urllib module Vs requests module. I wrote two clients in python the first one is using the urllib module and the second one is using the request module. they both generate a binary data, which I send to a server which is based on flask and from the flask server I also return a binary data to the client. I found that time took to send the data from the client to the server took same time for both modules (urllib, requests) but the time it took to return data from the server to the client is more then twice faster in urllib compare to request. I'm working on localhost.
my question is why?
what I'm doing wrong with request module which make it to be slower?

this is the server code :

from flask import Flask, request
app = Flask(__name__)
from timeit import default_timer as timer
import os

@app.route('/onStringSend', methods=['GET', 'POST'])
def onStringSend():
    return data

if __name__ == '__main__':
    data_size = int(1e7)
    data = os.urandom(data_size)    
    app.run(host="0.0.0.0", port=8080)

this is the client code based on urllib :

import urllib.request as urllib2
import urllib.parse
from timeit import default_timer as timer
import os

data_size = int(1e7)
num_of_runs = 20
url = 'http://127.0.0.1:8080/onStringSend'

def send_binary_data():
    data = os.urandom(data_size)
    headers = {'User-Agent': 'Mozilla/5.0 (compatible; Chrome/22.0.1229.94;  Windows NT)', 'Content-Length': '%d' % len(data), 'Content-Type':  'application/octet-stream'}
    req = urllib2.Request(url, data, headers)
    round_trip_time_msec = [0] * num_of_runs
    for i in range(0,num_of_runs):
        t1 = timer()
        resp = urllib.request.urlopen(req)
        response_data = resp.read()
        t2 = timer()
        round_trip_time_msec[i] = (t2 - t1) * 1000

    t_max = max(round_trip_time_msec)
    t_min = min(round_trip_time_msec)
    t_average = sum(round_trip_time_msec)/len(round_trip_time_msec)

    print('max round trip time [msec]: ', t_max)
    print('min round trip time [msec]: ', t_min)
    print('average round trip time [msec]: ', t_average)


send_binary_data()

this is the client code based on requests :

import requests
import os
from timeit import default_timer as timer


url = 'http://127.0.0.1:8080/onStringSend'
data_size = int(1e7)
num_of_runs = 20


def send_binary_data():
    data = os.urandom(data_size)
    s = requests.Session()
    s.headers['User-Agent'] = 'Mozilla/5.0 (compatible; Chrome/22.0.1229.94;Windows NT)'
    s.headers['Content-Type'] = 'application/octet-stream'
    s.headers['Content-Length'] = '%d' % len(data)

    round_trip_time_msec = [0] * num_of_runs
    for i in range(0,num_of_runs):
        t1 = timer()
        response_data = s.post(url=url, data=data, stream=False, verify=False)
        t2 = timer()
        round_trip_time_msec[i] = (t2 - t1) * 1000

    t_max = max(round_trip_time_msec)
    t_min = min(round_trip_time_msec)
    t_average = sum(round_trip_time_msec)/len(round_trip_time_msec)

    print('max round trip time [msec]: ', t_max)
    print('min round trip time [msec]: ', t_min)
    print('average round trip time [msec]: ', t_average)

send_binary_data()

thanks very much

402

asked May 10 '16 10:05

user1470957

1 Answers

First of all, to reproduce the problem, I had to add the following line to your onStringSend function:

request.get_data()

Otherwise, I was getting “connection reset by peer” errors because the server’s receive buffer kept filling up.

Now, the immediate reason for this problem is that Response.content (which is called implicitly when stream=False) iterates over the response data in chunks of 10240 bytes:

self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()

Therefore, the easiest way to solve the problem is to use stream=True, thus telling Requests that you will be reading the data at your own pace:

response_data = s.post(url=url, data=data, stream=True, verify=False).raw.read()

With this change, the performance of the Requests version becomes more or less the same as that of the urllib version.

Please also see the “Raw Response Content” section in the Requests docs for useful advice.

Now, the interesting question remains: why is Response.content iterating in such small chunks? After talking to Cory Benfield, a core developer of Requests, it looks like there may be no particular reason. I filed issue #3186 in Requests to look further into this.

185

answered Oct 13 '22 14:10

Vasiliy Faronov

Related questions
                            
                                ImportError: No module named 'jupyter_client
                            
                                How to find out size of a PhotoImage in Tkinter?
                            
                                String alignment in Tkinter
                            
                                How to set the spaces in a string format in Python 3
                            
                                timestamp string (Unix time) to datetime or pandas.Timestamp
                            
                                Selecting values from non-null columns in a PySpark DataFrame
                            
                                Betweenness centrality in NetworkX: logical error
                            
                                How to Pass JSON data from Django view to Vue.js instance methods
                            
                                POST data to Firebase using Python
                            
                                TypeError: string indices must be integers, not str // Trying to get value of key
                            
                                module initialization error: 'module' object has no attribute 'read_dotenv'
                            
                                Passing **kwargs received in a wrapper-function definition, to arguments of an enclosed (i.e. wrapped) function call
                            
                                How to retrieve only arabic texts from a string using regular expression?
                            
                                How to stop another already running script in python?
                            
                                How to filter rows that fall within 1st and 3rd quartile of a particular column in pandas dataframe?
                            
                                Load a local html file into a QWebView in Python
                            
                                Accumulate constant value in Numpy Array
                            
                                Cropping circle from image using opencv python
                            
                                How to retrieve bucket prefixes in a filesystem style using boto3
                            
                                Running Python script in PHP: capture all outputs

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python 3 urllib Vs requests performance

Tags:

performance

python

python-requests

urllib2

urllib3

user1470957

People also ask

1 Answers

Vasiliy Faronov

Recent Activity

Donate For Us