I have a CSV file and the first line contains the headers. So I thought it would be perfect to use Java 8 streams.
try (Stream<String> stream = Files.lines(csv_file) ){
stream.skip(1).forEach( line -> handleLine(line) );
} catch ( IOException ioe ){
handleError(ioe);
}
Is it possible to take the first element, analyze it and then call the forEach method? Something like
stream
.forFirst( line -> handleFirst(line) )
.skip(1)
.forEach( line -> handleLine(line) );
ADDITIONALLY: My CSV file contains around 1k lines and I can handle each line parallel to speed it up. Except the first line. I need the first line to initiallize other objects in my project :/ So maybe it is fast to open a BufferedReader, read the first line, close the BufferedReader and than use parallel streams?
In general, you can use iterators to do this:
Stream<Item> stream = ... //initialize your stream
Iterator<Item> i = stream.iterator();
handleFirst(i.next());
i.forEachRemaining(item -> handleRest(item));
In your program, it would look something like this:
try (Stream<String> stream = Files.lines(csv_file)){
Iterator<String> i = stream.iterator();
handleFirst(i.next());
i.forEachRemaining(s -> handleRest(s));
}
You may want to add some error checking in case you get 1 or 0 lines, but this should work.
A nice way to do that would be to get a BufferedReader
reading your file, for example with the help of Files.newBufferedReader(path)
. Then you can call nextLine()
one time to retrieve the header row, and lines()
to get a Stream<String>
of all the other rows:
try (BufferedReader br = Files.newBufferedReader(csv_file)){
String header = br.readLine();
// if header is null, the file was empty, you may want to throw an exception
br.lines().forEach(line -> handleLine(line));
}
This works because the first call to readLine()
will cause the buffered reader to read the first line, so subsequently, since lines()
is a stream populated by reading the lines, it starts reading at the second line. The buffered reader is also correctly closed by the try-with-resources when the processing ends.
Potentially, the stream pipeline could be run in parallel, but for I/O-bound tasks like this one, I wouldn't expect any performance improvement, unless it is the processing of each row that is the slower part. But be careful with the forEach
in this case: it will be ran concurrently and so its code needs to be thread-safe. It's unclear what the handleLine
method does, but, generally, you do not need forEach
and might prefer a mutable reduction with collect
, which would be safe to use in a parallel stream.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With