Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When reading a file is there an advantage to using a power of 2 [duplicate]

Tags:

php

Possible Duplicate:
How do you determine the ideal buffer size when using FileInputStream?

Is fread($file, 8192) any better or safer than fread($file, 10000)? Why do most examples use a power of two?

like image 735
matthewdaniel Avatar asked Dec 04 '12 16:12

matthewdaniel


3 Answers

Please see this great accepted answer to this question: How do you determine the ideal buffer size when using FileInputStream?.

Most file systems are configured to use block sizes of 4096 or 8192. In theory, if you configure your buffer size so you are reading a few bytes more than the disk block, the operations with the file system can be extremely inefficient (i.e. if you configured your buffer to read 4100 bytes at a time, each read would require 2 block reads by the file system). If the blocks are already in cache, then you wind up paying the price of RAM -> L3/L2 cache latency. If you are unlucky and the blocks are not in cache yet, the you pay the price of the disk->RAM latency as well.

This is why you see most buffers sized as a power of 2, and generally larger than (or equal to) the disk block size. This means that one of your stream reads could result in multiple disk block reads - but those reads will always use a full block - no wasted reads.

Although the question is Java-related, the answer is not. Moreover it's pretty much language-agnostic. That answer covers all factors I'm aware of regarding buffer sizes.

like image 89
andr Avatar answered Oct 10 '22 17:10

andr


Either because:

  • when picking arbitrary numbers programmers like to pick powers of two, or
  • in some sort of premature optimization, the programmer thinks that reading in multiples of block size will have some sort of speed boost.
like image 44
Andy Lester Avatar answered Oct 10 '22 16:10

Andy Lester


Operating systems allocate memory in pages, (typically 4k - but sometimes 8k).

In this case using a buffer size that is a multiple of 8192 bytes makes for more efficient memory allocation (since it is also caters for multiples of 4096 bytes).

If you request 13k of memory, 16k will be used anyway, so why not ask for 16k to start with.

CPU instruction sets are also optimised to work with data that is aligned to certain boundaries, be it 32, 64, or 128 bits. Working with data that is aligned to 3 bits, or 5 bits or something odd adds additional processing overhead.

This is not specific to PHP, which uses the Zend Memory Manager on top of the OS' own memory management, and probably allocates larger blocks of memory up-front and takes the concern of memory management away from the user.

like image 28
Leigh Avatar answered Oct 10 '22 16:10

Leigh