Reading large file in Java -- Java heap space

Tags:

I'm reading a large tsv file (~40G) and trying to prune it by reading line by line and print only certain lines to a new file. However, I keep getting the following exception:

java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2894)
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:532)
    at java.lang.StringBuffer.append(StringBuffer.java:323)
    at java.io.BufferedReader.readLine(BufferedReader.java:362)
    at java.io.BufferedReader.readLine(BufferedReader.java:379)

Below is the main part of the code. I specified the buffer size to be 8192 just in case. Doesn't Java clear the buffer once the buffer size limit is reached? I don't see what may cause the large memory usage here. I tried to increase the heap size but it didn't make any difference (machine with 4GB RAM). I also tried flushing the output file every X lines but it didn't help either. I'm thinking maybe I need to make calls to the GC but it doesn't sound right.

Any thoughts? Thanks a lot. BTW - I know I should call trim() only once, store it, and then use it.

Set<String> set = new HashSet<String>();
set.add("A-B");
...
...
static public void main(String[] args) throws Exception
{
   BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(inputFile),"UTF-8"), 8192);
   PrintStream output = new PrintStream(outputFile, "UTF-8");

   String line = reader.readLine();
   while(line!=null){
        String[] fields = line.split("\t");
        if( set.contains(fields[0].trim()+"-"+fields[1].trim()) )
            output.println((fields[0].trim()+"-"+fields[1].trim()));

        line = reader.readLine();
   }

output.close();

}

751

asked May 04 '11 22:05

user431336

1 Answers

Most likely, what's going on is that the file does not have line terminators, and so the reader just keeps growing it's StringBuffer unbounded until it runs out of memory.

The solution would be to read a fixed number of bytes at a time, using the 'read' method of the reader, and then look for new lines (or other parsing tokens) within the smaller buffer(s).

176

answered Sep 28 '22 06:09

toadaly

Related questions
                            
                                Java reduce CPU usage
                            
                                Sleep a thread until an event is attended in another thread from a different class
                            
                                method matches not work well [duplicate]
                            
                                Java - Accessing Static Method Sleep - What's wrong?
                            
                                How can I learn actual type argument of an generic class? [duplicate]
                            
                                Using reflection to get a method; method parameters of interface types aren't found
                            
                                Why are there no final interfaces in Java?
                            
                                How to control of display the number of digit in JSP?
                            
                                How to convert a byte into bits?
                            
                                JPA doesn't support interfaces well..implications?
                            
                                Test-driven development not working for my class
                            
                                How to detect mouse moving while left button down?
                            
                                Java mail with attachment: ClassCastException on javax.mail.Multipart
                            
                                How to produce XML signature with no whitespaces and line-breaks in Java?
                            
                                How to Get attributes list from ArrayList of objects
                            
                                Getting a node in JTree
                            
                                Why do Java array declarations use curly brackets?
                            
                                Spring3, Hibernate; how do I use HibernateTemplate
                            
                                How does an application that uses Spring's SimpleNamingContextBuilder know to search its directory for resources?
                            
                                What is the difference between "architecture-neutral" and "portable"?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reading large file in Java -- Java heap space

Tags:

java

file

file-io

user431336

People also ask

1 Answers

toadaly

Recent Activity

Donate For Us