Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ideal Chunk Size for python requests

Is there any guideline on selecting chunk size?

I tried different chunk size but none of them give download speed comparable to browser or wget download speed

here is snapshot of my code

 r = requests.get(url, headers = headers,stream=True)  total_length = int(r.headers.get('content-length'))  if not total_length is None: # no content length header  for chunk in r.iter_content(1024):      f.write(chunk) 

Any help would be appreciated.?

Edit: I tried network with different speed.. And I am able to achieve higher speed than my home network.. But when I tested wget and browser.. Speed is still not comparable

Thanks

like image 237
user3570335 Avatar asked Apr 29 '14 15:04

user3570335


People also ask

What is the ideal chunk size?

Chunk size between 100MB and 1GB are generally good, going over 1 or 2GB means you have a really big dataset and/or a lot of memory available per core, Upper bound: Avoid too large task graphs. More than 10,000 or 100,000 chunks may start to perform poorly.

What is chunk size in Python?

Technically the number of rows read at a time in a file by pandas is referred to as chunksize. Suppose If the chunksize is 100 then pandas will load the first 100 rows.

What happens when chunk size is large?

Larger chunk sizes normally result in a smaller deduplication database size, faster deduplication, and less fragmentation. These benefits sometimes come at the cost of less storage savings.

What is chunk size?

A chunk is the largest unit of physical disk dedicated to database server data storage. Chunks provide administrators with a significantly large unit for allocating disk space. The maximum size of an individual chunk is 4 TB.


1 Answers

You will lose time switching between reads and writes, and the limit of the chunk size is AFAIK only the limit of what you can store in memory. So as long as you aren't very concerned about keeping memory usage down, go ahead and specify a large chunk size, such as 1 MB (e.g. 1024 * 1024) or even 10 MB. Chunk sizes in the 1024 byte range (or even smaller, as it sounds like you've tested much smaller sizes) will slow the process down substantially.

For a very heavy-duty situation where you want to get as much performance as possible out of your code, you could look at the io module for buffering etc. But I think increasing the chunk size by a factor of 1000 or 10000 or so will probably get you most of the way there.

like image 82
Andrew Gorcester Avatar answered Sep 23 '22 03:09

Andrew Gorcester