What is an efficient way for a Java multithreaded application where many threads have to read the exact same file (> 1GB in size) and expose it as an input stream? I've noticed that if there are many threads (> 32), the system starts to contend over I/O and has a lot of I/O waits.
I've considered loading the file into a byte array that's shared by all the threads - each thread would create a ByteArrayInputStream, but allocating a 1GB byte array just won't work well.
I've also considered using a single FileChannel and each thread creating an InputStream on top of it using Channels.newInputStream(), however it seems that it's the FileChannel that's maintaining the state for the InputStream.
Multiple threads can also read data from the same FITS file simultaneously, as long as the file was opened independently by each thread. This relies on the operating system to correctly deal with reading the same file by multiple processes.
Multiple threads accessing shared data simultaneously may lead to a timing dependent error known as data race condition. Data races may be hidden in the code without interfering or harming the program execution until the moment when threads are scheduled in a scenario (the condition) that break the program execution.
It seems to me that you're going to have to load the file into memory if you want to avoid IO contention. The operating system will do some buffering, but if you're finding that's not enough, you're going to have to do it yourself.
Do you really need 32 threads though? Presumably you don't have nearly that many cores - so use fewer threads and you'll get less context switching etc.
Do your threads all process the file from start to finish? If so, could you effectively split the file into chunks? Read the first (say) 10MB of data into memory, let all the threads process it, then move on to the next 10MB etc.
If that doesn't work for you, how much memory do you have compared with the size of the file? If you have plenty of memory but you don't want to allocate one huge array, you could read the whole file into memory, but into lots of separate smaller byte arrays. You'd then have to write an input stream which spans all of those byte arrays, but that should be doable.
you can open the file multiple times in readonly mode. You can access the file in any way you want. Just leave the caching to the OS. When it's too slow you might consider some kind of chunk based caching where all threads can access the same cache.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With