Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Low performance with BufferedReader

I am processing a number of text files line by line using BufferReader.readlLine().

Two files having same size 130MB but one take 40sec to get processed while other takes 75 sec.

I noticed one file has 1.8 million of lines while other has 2.1 million. But when I tried to process a file with 3.0 million lines having same size it took 30 mins to process.

So my question is:

  1. Is this behavior because of seek time of buffer reader (I want to know how BufferedReader works or parses the file line by line?)

  2. Is there any way I can read the file line by line in a faster way?

Ok friends, I am providing some more details.

I am splitting the line into three parts using regex, then using SimpleUnsortedWriter (provided by Cassandra) I am writing it to some file as key, column and value. After the 16MB data is processed it flushes to disk.

But the processing logic is same for all the files, even one file of size 330MB but less no of lines around 1 million gets processed in 30 sec. What could be the reason?

deviceWriter = new SSTableSimpleUnsortedWriter(
        directory,
        keyspace,
        "Devices",
        UTF8Type.instance,
        null,
        16);

Pattern pattern = Pattern.compile("[\\[,\\]]");
while ((line = br.readLine()) != null)          
{
    //split the line i n row column and value
    long timestamp = System.currentTimeMillis() * 1000;
    deviceWriter .newRow(bytes(rowKey));
    deviceWriter .addColumn(bytes(colmName), bytes(value), timestamp);

}

Have changed -Xmx256M to -Xmx 1024M but it is not helping anyways.

Update: According to my observation, as I am writing into buffer (in physical memory), as the no. of writes into a buffer are increasing the newer writes are taking time. (This is my guess)

Please reply.

like image 216
samarth Avatar asked Aug 24 '11 16:08

samarth


People also ask

Which is faster Scanner or BufferedReader?

BufferedReader is a bit faster as compared to scanner because scanner does parsing of input data and BufferedReader simply reads sequence of characters.

Is BufferedReader efficient?

BufferedReader is much more efficient than FileReader in terms of performance. FileReader directly reads the data from the character stream that originates from a file.

How fast is BufferedReader in Java?

In an earlier post, I asked how fast the getline function in C++ could run through the lines in a text file. The answer was about 2 GB/s, certainly over 1 GB/s. That is slower than some of the best disk drives and network connections.

Should I use Scanner or BufferedReader?

Even though both are capable of reading user input from the console, you should use Scanner if an input is not big and you also want to read different types of input like int, float, and String. Use BufferedReader is you want to read the text without parsing.


2 Answers

The only thing BufferedReader does is read from the underlying Reader into an internal char[] buffer with a default size of 8K, and all methods work on that buffer until it's exhausted, at which point another 8K (or whatever) is read from the underlying Reader. The readLine() is sort of tacked on.

Correct use of BufferedReader should definitely not result in the running time rising from 40sec at 1.8m lines to 30 minutes at 3m lines. There must be something wrong with your code. Show it to us.

Another possibility is that your JVM does not have enough heap memory and spends most of the 30 minutes doing garbage collection because its heap is 99% full and you'd eventually get an OutOfMemoryError with larger input. What are you doing with the lines you have processed? Are they kept in memory? Does running the program with the -Xmx 1024M command line option make a difference?

like image 138
Michael Borgwardt Avatar answered Oct 13 '22 21:10

Michael Borgwardt


Look into NIO Buffered as they are more optimized than BufferReader.

Some code snippet from another forum. http://www.velocityreviews.com/forums/t719006-bufferedreader-vs-nio-buffer.html

FileChannel fc = new FileInputStream("File.txt").getChannel();
ByteBuffer buffer = ByteBuffer.allocate(1024);
fc.read(buffer);

Edit: Also lookinto this thread Read large files in Java

like image 35
Farmor Avatar answered Oct 13 '22 22:10

Farmor