Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does reading a file into memory takes 4x the memory in Java?

I have the following code which reads in the follow file, append a \r\n to the end of each line and puts the result in a string buffer:

public InputStream getInputStream() throws Exception {
    StringBuffer holder = new StringBuffer();
    try{
        FileInputStream reader = new FileInputStream(inputPath);


        BufferedReader br = new BufferedReader(new InputStreamReader(reader));
        String strLine;
        //Read File Line By Line
        boolean start = true;
        while ((strLine = br.readLine()) != null)   {
            if( !start )    
                holder.append("\r\n");

            holder.append(strLine);
            start = false;
        }
        //Close the input stream
        reader.close();
    }catch (Throwable e){//this is where the heap error is caught up to 2Gb
      System.err.println("Error: " + e.getMessage());
    }


    return new StringBufferInputStream(holder.toString());
}

I tried reading in a 400Mb file, and I changed the max heap space to 2Gb and yet it still gives the out of memory heap exception. Any ideas?

like image 742
erotsppa Avatar asked Jul 06 '09 21:07

erotsppa


People also ask

Why does Java consume so much memory?

Java is also a very high-level Object-Oriented programming language (OOP) which means that while the application code itself is much easier to maintain, the objects that are instantiated will use that much more memory.

How much memory does Java take up?

The JVM has a default setting of 1/4 of main memory. If you have 4 GB it will default to 1 GB. Note: this is a pretty small system and you get get some embedded devices and phones which this much memory. If you can afford to buy a little more memory it will make your life easier.

What causes high heap memory usage?

High heap usage occurs when the garbage collection process cannot keep up. An indicator of high heap usage is when the garbage collection is incapable of reducing the heap usage to around 30%. In the image above you can see normal sawtooth of JVM heap.


1 Answers

You have a number of problems here:

  • Unicode: characters take twice as much space in memory as on disk (assuming a 1 byte encoding)
  • StringBuffer resizing: could double (permanently) and triple (temporarily) the occupied memory, though this is the worst case
  • StringBuffer.toString() temporarily doubles the occupied memory since it makes a copy

All of these combined mean that you could require temporarily up to 8 times your file's size in RAM, i.e. 3.2G for a 400M file. Even if your machine physically has that much RAM, it has to be running a 64bit OS and JVM to actually get that much heap for the JVM.

All in all, it's simply a horrible idea to keep such a huge String in memory - and it's totally unneccessary as well - since your method returns an InputStream, all you really need is a FilterInputStream that adds the line breaks on the fly.

like image 143
Michael Borgwardt Avatar answered Oct 21 '22 22:10

Michael Borgwardt