I have a piece of code that reads hell of a lot (hundreds of thousand) of relatively small files (couple of KB) from the local file system in a loop. For each file there is a java.io.FileInputStream
created to read the content. The process its very slow and take ages.
Do you think that wrapping the FIS into java.io.BufferedInputStream
would make a significant difference?
If you aren't already using a byte[] buffer
of a decent size in the read/write loop (the latest implementation of BufferedInputStream
uses 8KB), then it will certainly make difference. Give it a try yourself. Don't forget to make any OutputStream
a BufferedOutputStream
as well.
But if you already have buffered it using a byte[]
and/or it after all makes only little difference, then you've hit the harddisk and I/O controller speed as the bottleneck.
I very much doubt whether that will make any difference.
Your fundamental problem is the hundreds of throusands of tiny files. Reading those is going to make the disk thrash and take forever, no matter how you do it, you'll spend 99,9% of the time waiting on mechanical movement inside the harddisk.
There are two ways to fix this:
That depends on how you're reading the data. If you're reading from the FileInputStream in a very inefficient way (for example, calling read() byte-by-byte), then using a BufferedInputStream could improve things dramatically. But if you're already using a reasonable-sized buffer with FileInputStream, switching to a BufferedInputStream won't matter.
Since you're talking a large number of very small files, there's a strong possibility that a lot of the delay is due to directory operations (open, close), not the actual reading of bytes from the files.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With