Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java 8 streams take first and than call forEach(...)

I have a CSV file and the first line contains the headers. So I thought it would be perfect to use Java 8 streams.

    try (Stream<String> stream = Files.lines(csv_file) ){
        stream.skip(1).forEach( line -> handleLine(line) );
    } catch ( IOException ioe ){
        handleError(ioe);
    }

Is it possible to take the first element, analyze it and then call the forEach method? Something like

stream
      .forFirst( line -> handleFirst(line) )
      .skip(1)
      .forEach( line -> handleLine(line) );

ADDITIONALLY: My CSV file contains around 1k lines and I can handle each line parallel to speed it up. Except the first line. I need the first line to initiallize other objects in my project :/ So maybe it is fast to open a BufferedReader, read the first line, close the BufferedReader and than use parallel streams?

like image 279
Highchiller Avatar asked Nov 30 '16 21:11

Highchiller


2 Answers

In general, you can use iterators to do this:

Stream<Item> stream = ... //initialize your stream
Iterator<Item> i = stream.iterator();
handleFirst(i.next());
i.forEachRemaining(item -> handleRest(item));

In your program, it would look something like this:

try (Stream<String> stream = Files.lines(csv_file)){
    Iterator<String> i = stream.iterator();
    handleFirst(i.next());
    i.forEachRemaining(s -> handleRest(s));
}

You may want to add some error checking in case you get 1 or 0 lines, but this should work.

like image 106
CodeBlind Avatar answered Oct 16 '22 09:10

CodeBlind


A nice way to do that would be to get a BufferedReader reading your file, for example with the help of Files.newBufferedReader(path). Then you can call nextLine() one time to retrieve the header row, and lines() to get a Stream<String> of all the other rows:

try (BufferedReader br = Files.newBufferedReader(csv_file)){
    String header = br.readLine();
    // if header is null, the file was empty, you may want to throw an exception
    br.lines().forEach(line -> handleLine(line));
}

This works because the first call to readLine() will cause the buffered reader to read the first line, so subsequently, since lines() is a stream populated by reading the lines, it starts reading at the second line. The buffered reader is also correctly closed by the try-with-resources when the processing ends.

Potentially, the stream pipeline could be run in parallel, but for I/O-bound tasks like this one, I wouldn't expect any performance improvement, unless it is the processing of each row that is the slower part. But be careful with the forEach in this case: it will be ran concurrently and so its code needs to be thread-safe. It's unclear what the handleLine method does, but, generally, you do not need forEach and might prefer a mutable reduction with collect, which would be safe to use in a parallel stream.

like image 4
Tunaki Avatar answered Oct 16 '22 10:10

Tunaki