Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does a Java InputStream help or hurt memory usage with large files?

I see some posts on StackOverflow that contradict each other, and I would like to get a definite answer.

I started with the assumption that using a Java InputStream would allow me to stream bytes out of a file, and thus save on memory, as I would not have to consume the whole file at once. And that is exactly what I read here:

Loading all bytes to memory is not a good practice. Consider returning the file and opening an input stream to read it, so your application won't crash when handling large files. – andrucz

Download file to stream instead of File

But then I used an InputStream to read a very large Microsoft Excel file (using the Apache POI library) and I ran into this error:

java.lang.outofmemory exception while reading excel file (xlsx) using POI

I got an OutOfMemory error.

And this crucial bit of advice saved me:

One thing that'll make a small difference is when opening the file to start with. If you have a file, then pass that in! Using an InputStream requires buffering of everything into memory, which eats up space. Since you don't need to do that buffering, don't!

I got rid of the InputStream and just used a bare java.io.File, and then the OutOfMemory error went away.

So using java.io.File is better than an InputSteam, when it comes to memory use? That doesn't make any sense.

What is the real answer?

like image 916
lars Avatar asked Dec 18 '16 04:12

lars


People also ask

Is InputStream in memory?

Note that InputStream is an abstract class and exact details depend on the specific subclass. For example ByteArrayInputStream holds everything in memory, FileInputStream holds nothing in memory, while BufferedInputStream only holds a buffer of N number of bytes in memory.

Why do we need InputStream in Java?

The InputStream is used to read data from a source and the OutputStream is used for writing data to a destination. Here is a hierarchy of classes to deal with Input and Output streams.

What happens if you don't close an InputStream?

resource-leak is probably more of a concern here. Handling inputstream requires OS to use its resources and if you don't free it up once you use it, you will eventually run out of resources.

What is InputStream used for?

1.1 InputStream: InputStream is an abstract class of Byte Stream that describe stream input and it is used for reading and it could be a file, image, audio, video, webpage, etc. it doesn't matter. Thus, InputStream read data from source one item at a time.


1 Answers

So you are saying that an InputStream would typically help?

It entirely depends on how the application (or library) >>uses<< the InputStream

With what kind of follow up code? Could you offer an example of memory efficient Java?

For example:

  // Efficient use of memory
  try (InputStream is = new FileInputStream(largeFileName);
       BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
      String line;
      while ((line = br.readLine()) != null) {
          // process one line
      }
  }

  // Inefficient use of memory
  try (InputStream is = new FileInputStream(largeFileName);
       BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
      StringBuilder sb = new StringBuilder();
      while ((line = br.readLine()) != null) {
          sb.append(line).append("\n");
      }
      String everything = sb.toString();
      // process the entire string
  }

  // Very inefficient use of memory
  try (InputStream is = new FileInputStream(largeFileName);
       BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
      String everything = "";
      while ((line = br.readLine()) != null) {
          everything += line + "\n";
      }
      // process the entire string
  }

(Note that there are more efficient ways of reading a file into memory. The above examples are purely to illustrate the principles.)

The general principles here are:

  • avoid holding the entire file in memory, all at the same time
  • if you have to hold the entire file in memory, then be careful about you "accumulate" the characters.

The posts that you linked to above:

  • The first one is not really about memory efficiency. Rather it is talking about a limitation of the AWS client-side library. Apparently, the API doesn't provide an easy way to stream an object while reading it. You have to save it the object to a file, then open the file as a stream. Whether that is memory efficient or not depends on what the application does with the stream; see above.

  • The second one specific to the POI APIs. Apparently, the POI library itself is reading the stream contents into memory if you use a stream. That would be an implementation limitation of that particular library.

like image 118
Stephen C Avatar answered Oct 23 '22 18:10

Stephen C