Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java 8 Streams: Read file word by word

I use Java 8 streams a lot to process files but so far always line-by-line.

What I want is a function, which gets a BufferedReader br and should read an specific number of words (seperated by "\\s+") and should leave the BufferedReader at the exact position, where the number of words was reached.

Right now I have a version, which reads the file linewise:

    final int[] wordCount = {20};
    br
          .lines()
          .map(l -> l.split("\\s+"))
          .flatMap(Arrays::stream)
          .filter(s -> {
              //Process s
              if(--wordCount[0] == 0) return true;
              return false;
          }).findFirst();

This obviously leaves the Inputstream at the position of the next line of the 20th word.
Is there a way to get a stream which reads less than a line from the inputstream?

EDIT
I am parsing a file where the first word contains the number of following words. I read this word and then accordingly read in the specific number of words. The file contains multiple such sections, where each section is parsed in the described function.

Having read all the helpful comments, it becomes clear to me, that using a Scanner is the right choice for this problem and that Java 9 will have a Scanner class which provides stream features (Scanner.tokens() and Scanner.findAll()).
Using Streams the way I described it will give me no guarantee, that the reader will be at specific position, after the terminal operation of the stream (API docs), therefore making streams the wrong choice for parsing a structure, where you parse only a section and have to keep track of the position.

like image 244
Tobi Avatar asked Feb 08 '16 17:02

Tobi


People also ask

What is the easiest way to read text files line by line in Java 8?

Java 8 has added a new method called lines() in the Files class which can be used to read a file line by line in Java. The beauty of this method is that it reads all lines from a file as Stream of String, which is populated lazily as the stream is consumed.

Does Java 8 support streams?

Java 8 offers the possibility to create streams out of three primitive types: int, long and double. As Stream<T> is a generic interface, and there is no way to use primitives as a type parameter with generics, three new special interfaces were created: IntStream, LongStream, DoubleStream.

How does Java 8 streams work internally?

Introduced in Java 8, the Stream API is used to process collections of objects. A stream is a sequence of objects that supports various methods which can be pipelined to produce the desired result. A stream is not a data structure instead it takes input from the Collections, Arrays or I/O channels.


1 Answers

Regarding your original problem: I assume your file looks like this:

5 a section of five words 3 three words
section 2 short section 7 this section contains a lot 
of words

And you want to get the output like this:

[a, section, of, five, words]
[three, words, section]
[short, section]
[this, section, contains, a, lot, of, words]

In general Stream API is badly suitable for such problems. Writing plain old loop looks a better solution here. If you still want to see Stream API based solution, I can suggest using my StreamEx library which contains headTail() method allowing you to easily write custom stream-transformation logic. Here's how your problem could be solved using the headTail:

/* Transform Stream of words like 2, a, b, 3, c, d, e to
   Stream of lists like [a, b], [c, d, e] */
public static StreamEx<List<String>> records(StreamEx<String> input) {
    return input.headTail((count, tail) -> 
        makeRecord(tail, Integer.parseInt(count), new ArrayList<>()));
}

private static StreamEx<List<String>> makeRecord(StreamEx<String> input, int count, 
                                                 List<String> buf) {
    return input.headTail((head, tail) -> {
        buf.add(head);
        return buf.size() == count 
                ? records(tail).prepend(buf)
                : makeRecord(tail, count, buf);
    });
}

Usage example:

String s = "5 a section of five words 3 three words\n"
        + "section 2 short section 7 this section contains a lot\n"
        + "of words";
Reader reader = new StringReader(s);
Stream<List<String>> stream = records(StreamEx.ofLines(reader)
               .flatMap(Pattern.compile("\\s+")::splitAsStream));
stream.forEach(System.out::println);

The result looks exactly as desired output above. Replace reader with your BufferedReader or FileReader to read from the input file. The stream of records is lazy: at most one record is preserved by the stream at a time and if you short-circuit, the rest of the input will not be read (well, of course the current file line will be read to the end). The solution, while looks recursive, does not eat stack or heap, so it works for huge files as well.


Explanation:

The headTail() method takes a two-argument lambda which is executed at most once during the outer stream terminal operation execution, when stream element is requested. The lambda receives the first stream element (head) and the stream which contains all other original elements (tail). The lambda should return a new stream which will be used instead of the original one. In records we have:

return input.headTail((count, tail) -> 
    makeRecord(tail, Integer.parseInt(count), new ArrayList<>()));

First element of the input is count: convert it to number, create empty ArrayList and call makeRecord for the tail. Here's makeRecord helper method implementation:

return input.headTail((head, tail) -> {

First stream element is head, add it to the current buffer:

    buf.add(head);

Target buffer size is reached?

    return buf.size() == count 

If yes, call the records for the tail again (process the next record, if any) and prepend the resulting stream with single element: current buffer.

            ? records(tail).prepend(buf)

Otherwise, call myself for the tail (to add more elements to the buffer).

            : makeRecord(tail, count, buf);
});
like image 95
Tagir Valeev Avatar answered Sep 19 '22 12:09

Tagir Valeev