Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java8 Stream of files, how to control the closing of files?

Suppose I have a Java8 Stream<FileReader> and that I use that stream to map and such, how can I control the closing of the FileReaders used in the stream?

Note, that I may not have access to the individual FileReaders, for example:

filenames.map(File::new)
    .filter(File::exists)
    .map(f->{
        BufferedReader br = null;
        try {
            br = new BufferedReader(new FileReader(f));
        } catch(Exception e) {}
            return Optional.ofNullable(br);
        })
    .filter(Optional::isPresent)
    .map(Optional::get)
    .flatMap(...something that reads the file contents...) // From here, the Stream doesn't content something that gives access to the FileReaders

After doing some other mappings, etc, I finally lose the FileReaders in the sequel.

I first thought the garbage collector is able to do it when needed, but I've experienced OS descriptor exhaustion when filenames is a long Stream.

like image 607
Jean-Baptiste Yunès Avatar asked Apr 19 '17 19:04

Jean-Baptiste Yunès


1 Answers

A general note on the use of FileReader: FileReader uses internally a FileInputStream which overrides finalize() and is therefore discouraged to use beacause of the impact it has on garbarge collection especially when dealing with lots of files.

Unless you're using a Java version prior to Java 7 you should use the java.nio.files API instead, creating a BufferedReader with

 Path path = Paths.get(filename);
 BufferedReader br = Files.newBufferedReader(path);

So the beginning of your stream pipeline should look more like

 filenames.map(Paths::get)
          .filter(Files::exists)
          .map(p -> {
        try {
            return Optional.of(Files.newBufferedReader(p));
        } catch (IOException e) {
            return Optional.empty();
        }
    }) 

Now to your problem:

Option 1

One way to preserve the original Reader would be to use a Tuple. A tuple (or any n-ary variation of it) is generally a good way to handle multiple results of a function application, as it's done in a stream pipeline:

class ReaderTuple<T> {
   final Reader first;
   final T second;
   ReaderTuple(Reader r, T s){
     first = r;
     second = s;
   }
}

Now you can map the FileReader to a Tuple with the second item being your current stream item:

 filenames.map(Paths::get)
  .filter(Files::exists)
  .map(p -> {
        try {
            return Optional.of(Files.newBufferedReader(p));
        } catch (IOException e) {
            return Optional.empty();
        }
    }) 
  .filter(Optional::isPresent)
  .map(Optional::get)
  .flatMap(r -> new ReaderTuple(r, yourOtherItem))
  ....
  .peek(rt -> {
    try { 
      rt.first.close()  //close the reader or use a try-with-resources
    } catch(Exception e){}
   })
  ... 

Problem with that approach is, that whenever an unchecked exception occurrs during stream execution betweem the flatMap and the peek, the readers might not be closed.

Option 2

An alternative to use a tuple is to put the code that requires the reader in a try-with-resources block. This approach has the advantage that you're in control to close all readers.

Example 1:

 filenames.map(Paths::get)
  .filter(Files::exists)
  .map(p -> {
        try (Reader r = new BufferedReader(new FileReader(p))){

            Stream.of(r)
            .... //put here your stream code that uses the stream

        } catch (IOException e) {
            return Optional.empty();
        }
    }) //reader is implicitly closed here
 .... //terminal operation here

Example 2:

filenames.map(Paths::get)
  .filter(Files::exists)
  .map(p -> {
        try {
            return Optional.of(Files.newBufferedReader(p));
        } catch (IOException e) {
            return Optional.empty();
        }
    }) 
 .filter(Optional::isPresent)
 .map(Optional::get)
 .flatMap(reader -> {
   try(Reader r = reader) {

      //read from your reader here and return the items to flatten

   } //reader is implicitly closed here
  }) 

Example 1 has the advantage that the reader gets certainly closed. Example 2 is safe unless you put something more between the the creation of the reader and the try-with-resources block that may fail.

I personally would go for Example 1, and put the code that is accessing the reader in a separate function so the code is better readable.

like image 168
Gerald Mücke Avatar answered Oct 05 '22 16:10

Gerald Mücke