I have a program in which each thread reads in files many lines at a time from a file, processes the lines, and writes the lines out to a different file. Four threads split the list of files to process among them. I'm having strange performance issues across two cases:
This is internal software I'm working on so unfortunately I can't share any source code, but the main steps of the program are:
String[] array.What I don't understand is why 30k files with 12 lines each gives me greater performance than a few files with many lines each. I would have expected that the overhead of opening and closing the files to be greater than that of reading a single file. In addition, the decline in performance of the former case is exponential in nature.
I've set the maximum heap size to 1024 MB and it appears to use 100 MB at most, so an overtaxed GC isn't the problem. Do you have any other ideas?
From your numbers, I guess that GC is probably not the issue. I suspect that this is a normal behavior of a disk, being operated on by many concurrent threads. When the files are big, the disk has to switch context between the threads many times (producing significant disk seek time), and the overhead is apparent. With small files, maybe they are read as a single chunk with no extra seek time, so threads do not interfere with each other too much.
When working with a single, standard disk, serial IO is usually better that parallel IO.
I am assuming that the files are located on the same disk, in which case you are probably thrashing the disk (or invalidating the disk\OS cache) with multiple threads attempting to read concurrently and write concurrently. A better pattern may be to have a dedicated reader\writer thread to handle IO, and then alter your pattern so that the job of transform (which sounds expensive) is handled by multiple threads. Your IO thread can fetch and overlap writing with the transform operations as results become available. This should stop disk thrashing, and balance the IO and CPU side of your pattern.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With