Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Requests taking a long time

Basically I am working on a python project where I download and index files from the sec edgar database. The problem however, is that when using the requests module, it take a very long time to save the text in a variable (between ~130 and 170 seconds for one file).

The file roughly has around 16 million characters, and I wanted to see if there was any way to easily lower the time it takes to retrieve the text. -- Example:

import requests

url ="https://www.sec.gov/Archives/edgar/data/0001652044/000165204417000008/goog10-kq42016.htm"

r = requests.get(url, stream=True)

print(r.text)

Thanks!

like image 757
Jake Schurch Avatar asked Aug 07 '17 17:08

Jake Schurch


1 Answers

What I found is in the code for r.text, specifically when no encoding was given ( r.encoding == 'None' ). The time spend detecting the encoding was 20 seconds, I was able to skip it by defining the encoding.

...
r.encoding = 'utf-8' 
...

Additional details

In my case, my request was not returning an encoding type. The response was 256k in size, the r.apparent_encoding was taking 20 seconds.

Looking into the text property function. It tests to see if there is an encoding. If there is None, it will call the apperent_encoding function which will scan the text to autodetect the encoding scheme.

On a long string this will take time. By defining the encoding of the response ( as described above), you will skip the detection.

Validate that this is your issue

in your above example :

from datetime import datetime    
import requests

url = "https://www.sec.gov/Archives/edgar/data/0001652044/000165204417000008/goog10-kq42016.htm"

r = requests.get(url, stream=True)

print(r.encoding)

print(datetime.now())
enc = r.apparent_encoding
print(enc)

print(datetime.now())
print(r.text)
print(datetime.now())

r.encoding = enc
print(r.text)
print(datetime.now())

of course the output may get lost in the printing, so I recommend you run the above in an interactive shell, it may become more aparent where you are losing the time even without printing datetime.now()

like image 181
AlainChiasson Avatar answered Feb 11 '23 06:02

AlainChiasson