I am using Anaconda - Python 3.5.2
I have a list of 280,000 urls. I am grabbing the data and trying to keep track of the url-to-data.
I've made about 30K requests. I am averaging 1 request per second.
response_df = pd.DataFrame()
# create the session
with requests.Session() as s:
# loop through the list of urls
for url in url_list:
# call the resource
resp = s.get(url)
# check the response
if resp.status_code == requests.codes.ok:
# create a new dataframe with the response
ftest = json_normalize(resp.json())
ftest['url'] = url
response_df = response_df.append(ftest, ignore_index=True)
else:
print("Something went wrong! Hide your wife! Hide the kids!")
response_df.to_csv(results_csv)
I ended up ditching requests, I used async and aiohttp instead. I was pulling about 1 per second with requests. The new method averages about 5 per second, and only utilizes about 20% of my system resources. I ended up using something very similar to this: https://www.blog.pythonlibrary.org/2016/07/26/python-3-an-intro-to-asyncio/
import aiohttp
import asyncio
import async_timeout
import os
async def download_coroutine(session, url):
with async_timeout.timeout(10):
async with session.get(url) as response:
filename = os.path.basename(url)
with open(filename, 'wb') as f_handle:
while True:
chunk = await response.content.read(1024)
if not chunk:
break
f_handle.write(chunk)
return await response.release()
async def main(loop):
urls = ["http://www.irs.gov/pub/irs-pdf/f1040.pdf",
"http://www.irs.gov/pub/irs-pdf/f1040a.pdf",
"http://www.irs.gov/pub/irs-pdf/f1040ez.pdf",
"http://www.irs.gov/pub/irs-pdf/f1040es.pdf",
"http://www.irs.gov/pub/irs-pdf/f1040sb.pdf"]
async with aiohttp.ClientSession(loop=loop) as session:
for url in urls:
await download_coroutine(session, url)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main(loop))
also, this was helpful: https://snarky.ca/how-the-heck-does-async-await-work-in-python-3-5/ http://www.pythonsandbarracudas.com/blog/2015/11/22/developing-a-computational-pipeline-using-the-asyncio-module-in-python-3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With