Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improve BufferedReader Speed

I am crunching through many gigabytes of text data and I was wondering if there is a way to improve performance. For example when going through 10 gigabytes of data and not processing it at all, just iterating line by line, it takes about 3 minutes.

Basically I have a dataIterator wrapper that contains a BufferedReader. I continuously call this iterator, which returns the next line.

Is the problem the number of strings being created? Or perhaps the number of function calls. I don't really know how to profile this application because it get compiled as a jar and used as a STAF service.

Any and all ideas appreciated?

like image 566
esiegel Avatar asked Dec 29 '22 23:12

esiegel


1 Answers

Lets start from the basis: your application is I/O-bound. You are not suffering bad performance due to object allocation, or memory, or CPU limits. Your application is running slowly because of disk access.

If you think you can improve file access, you might need to resort to lower-level programming using the JNI. File access can be improved if you handle it more efficiently by yourself, and that will need to be done on a lower level.

I am not sure that using java.nio will give you better performance by magnitude which you are looking for, although it might give you some more freedom in doing CPU/memory intensive operations while I/O is running.

The reason being is that basically, java.nio wraps the file reading with a selector, letting you be notified when a buffer is read for use, indeed giving you the asynchronous behavior which might help your performance a bit. But reading the file itself is your bottleneck, and java.nio doesn't give you anything in that area.

So try it out first, but I wouldn't keep my hopes too high for it.

like image 145
Yuval Adam Avatar answered Jan 12 '23 08:01

Yuval Adam