Determining Appropriate Buffer Size

Tags:

I am using ByteBuffer.allocateDirect() to allocate some buffer memory for reading a file into memory and then eventually hashing that files bytes and getting a file hash (SHA) out of it. The input files range greatly in size, anywhere from a few KB's to several GB's.

I have read several threads and pages (even some on SO) regarding selecting a buffer size. Some advised trying to select one that the native FileSystem uses in an attempt to minimalize chances of a read operation for a partial block,etc. Such as buffer of 4100 bytes and NTFS defaults to 4096, so the extra 4 bits would require a separate read operation, being extremely wasteful.

So sticking with the powers of 2, 1024, 2048, 4096, 8192, etc. I have seen some recommend buffers the size of 32KB's, and other recommend making the buffer the size of the input file (probably fine for small files, but what about large files?).

How important is it to stick to native block sized buffers? Modernly speaking (assuming modern SATA drive or better with at least 8Mb of on drive cache, and other modern OS "magic" to optimize I/O) how critical is the buffer size and how should I best determine what size to set mine to? I could statically set it, or dynamically determine it? Thanks for any insight.

716

asked Apr 17 '13 18:04

SnakeDoc

1 Answers

To answer your direct question: (1) filesystems tend to use powers of 2, so you want to do the same. (2) the larger your working buffer, the less effect any mis-sizing will have.

As you say, if you allocate 4100 and the actual block size is 4096, you'll need two reads to fill the buffer. If, instead, you have a 1,000,000 byte buffer, then being one block high or low doesn't matter (because it takes 245 4096-byte blocks to fill that buffer). Moreover, the larger buffer means that the OS has a better chance to order the reads.

That said, I wouldn't use NIO for this. Instead, I'd use a simple BufferedInputStream, with maybe a 1k buffer for my read()s.

The main benefit of NIO is keeping data out of the Java heap. If you're reading and writing a file, for example, using an InputStream means that the OS reads the data into a JVM-managed buffer, the JVM copies that into an on-heap buffer, then copies it again to an off-heap buffer, then the OS reads that off-heap buffer to write the actual disk blocks (and typically adds its own buffers). In this case, NIO will eliminate that native-heap copies.

However, to compute a hash, you need to have the data in the Java heap, and the Mac SPI will move it there. So you don't get the benefit of NBI keeping the data off-heap, and IMO the "old IO" is easier to write.

Just don't forget that InputStream.read() is not guaranteed to read all the bytes you ask for.

123

answered Sep 30 '22 15:09

parsifal

Related questions
                            
                                How exactly does Java Scanner parse double?
                            
                                how do I create a file object from an absolute pathname?
                            
                                jax-ws web service does not work in websphere 8.5
                            
                                How to apply MVC with multiple Windows/Dialogs
                            
                                Want To Have Other Language Font(Hindi) In Response Through JSON Using RESTful Web Service
                            
                                Automatically check Java files for coding standard compliance
                            
                                Scalar Multiplication of Point over elliptic Curve
                            
                                Mock "inner" object with Mockito
                            
                                Please provide example for an If statement in hibernate criteria
                            
                                How to support multiple projects in multiple languages (Java and Scala) in Gradle?
                            
                                how to store a json object to a hibernate database field
                            
                                Junit Best Practice: Public method calling multiple private methods
                            
                                ClassNotFoundException: twitter4j.conf.PropertyConfigurationFactory : Android [closed]
                            
                                j2objc - resolving object types
                            
                                How smart is Java about if statements with final variables
                            
                                Options for HTML editing in JavaFX
                            
                                Hibernate loading a lazy proxy, but I only need the PK
                            
                                Loose coupling with Class.forName()
                            
                                Google App Engine Java and Android Getting Started
                            
                                Changing LocalDateTime based on time difference in current time zone vs. eastern time zone

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Determining Appropriate Buffer Size

Tags:

java

buffer

bytebuffer

SnakeDoc

People also ask

1 Answers

parsifal

Recent Activity

Donate For Us