Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading large file in Java -- Java heap space

Tags:

java

file

file-io

I'm reading a large tsv file (~40G) and trying to prune it by reading line by line and print only certain lines to a new file. However, I keep getting the following exception:

java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2894)
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:532)
    at java.lang.StringBuffer.append(StringBuffer.java:323)
    at java.io.BufferedReader.readLine(BufferedReader.java:362)
    at java.io.BufferedReader.readLine(BufferedReader.java:379)

Below is the main part of the code. I specified the buffer size to be 8192 just in case. Doesn't Java clear the buffer once the buffer size limit is reached? I don't see what may cause the large memory usage here. I tried to increase the heap size but it didn't make any difference (machine with 4GB RAM). I also tried flushing the output file every X lines but it didn't help either. I'm thinking maybe I need to make calls to the GC but it doesn't sound right.

Any thoughts? Thanks a lot. BTW - I know I should call trim() only once, store it, and then use it.

Set<String> set = new HashSet<String>();
set.add("A-B");
...
...
static public void main(String[] args) throws Exception
{
   BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(inputFile),"UTF-8"), 8192);
   PrintStream output = new PrintStream(outputFile, "UTF-8");

   String line = reader.readLine();
   while(line!=null){
        String[] fields = line.split("\t");
        if( set.contains(fields[0].trim()+"-"+fields[1].trim()) )
            output.println((fields[0].trim()+"-"+fields[1].trim()));

        line = reader.readLine();
   }

output.close();

}
like image 751
user431336 Avatar asked May 04 '11 22:05

user431336


People also ask

How do I fix Java heap space error?

OutOfMemoryError: Java heap space. 1) An easy way to solve OutOfMemoryError in java is to increase the maximum heap size by using JVM options "-Xmx512M", this will immediately solve your OutOfMemoryError.

What happens if heap memory is full in Java?

Java objects reside in an area called the heap. The heap is created when the JVM starts up and may increase or decrease in size while the application runs. When the heap becomes full, garbage is collected. During the garbage collection objects that are no longer used are cleared, thus making space for new objects.

What is Java heap size limit?

The default maximum heap size for the Java™ data provider is 256 megabytes. You must set the maximum heap size to an appropriate value that depends on the size of the VMware environment.


1 Answers

Most likely, what's going on is that the file does not have line terminators, and so the reader just keeps growing it's StringBuffer unbounded until it runs out of memory.

The solution would be to read a fixed number of bytes at a time, using the 'read' method of the reader, and then look for new lines (or other parsing tokens) within the smaller buffer(s).

like image 176
toadaly Avatar answered Sep 28 '22 06:09

toadaly