I am crunching through many gigabytes of text data and I was wondering if there is a way to improve performance. For example when going through 10 gigabytes of data and not processing it at all, just iterating line by line, it takes about 3 minutes.
Basically I have a dataIterator wrapper that contains a BufferedReader. I continuously call this iterator, which returns the next line.
Is the problem the number of strings being created? Or perhaps the number of function calls. I don't really know how to profile this application because it get compiled as a jar and used as a STAF service.
Any and all ideas appreciated?
Lets start from the basis: your application is I/O-bound. You are not suffering bad performance due to object allocation, or memory, or CPU limits. Your application is running slowly because of disk access.
If you think you can improve file access, you might need to resort to lower-level programming using the JNI. File access can be improved if you handle it more efficiently by yourself, and that will need to be done on a lower level.
I am not sure that using java.nio
will give you better performance by magnitude which you are looking for, although it might give you some more freedom in doing CPU/memory intensive operations while I/O is running.
The reason being is that basically, java.nio
wraps the file reading with a selector, letting you be notified when a buffer is read for use, indeed giving you the asynchronous behavior which might help your performance a bit. But reading the file itself is your bottleneck, and java.nio
doesn't give you anything in that area.
So try it out first, but I wouldn't keep my hopes too high for it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With