Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading chunks of a text file with a Java 8 Stream

Java 8 has a way to create a Stream from lines of a file. In this case, foreach will step through lines. I have a text file with following format..

bunch of lines with text
$$$$
bunch of lines with text
$$$$

I need to get each set of lines that goes before $$$$ into a single element in the Stream.

In other words, I need a Stream of Strings. Each string contains the content that goes before $$$$.

What is the best way (with minimum overhead) to do this?

like image 589
lochi Avatar asked Oct 10 '16 06:10

lochi


People also ask

What is the easiest way to read text files line by line in Java 8?

Java Read File line by line using BufferedReader We can use java. io. BufferedReader readLine() method to read file line by line to String. This method returns null when end of file is reached.

What does stream () Do Java?

A stream consists of source followed by zero or more intermediate methods combined together (pipelined) and a terminal method to process the objects obtained from the source as per the methods described. Stream is used to compute elements as per the pipelined methods without altering the original value of the object.


2 Answers

I couldn't come up with a solution that processes the lines lazily. I'm not sure if this is possible.

My solution produces an ArrayList. If you have to use a Stream, simply call stream() on it.

public class DelimitedFile {
    public static void main(String[] args) throws IOException {
        List<String> lines = lines(Paths.get("delimited.txt"), "$$$$");
        for (int i = 0; i < lines.size(); i++) {
            System.out.printf("%d:%n%s%n", i, lines.get(i));
        }
    }

    public static List<String> lines(Path path, String delimiter) throws IOException {
        return Files.lines(path)
                .collect(ArrayList::new, new BiConsumer<ArrayList<String>, String>() {
                    boolean add = true;

                    @Override
                    public void accept(ArrayList<String> lines, String line) {
                        if (delimiter.equals(line)) {
                            add = true;
                        } else {
                            if (add) {
                                lines.add(line);
                                add = false;
                            } else {
                                int i = lines.size() - 1;
                                lines.set(i, lines.get(i) + '\n' + line);
                            }
                        }
                    }
                }, ArrayList::addAll);
    }
}

File content:

bunch of lines with text
bunch of lines with text2
bunch of lines with text3
$$$$
2bunch of lines with text
2bunch of lines with text2
$$$$
3bunch of lines with text
3bunch of lines with text2
3bunch of lines with text3
3bunch of lines with text4
$$$$

Output:

0:
bunch of lines with text
bunch of lines with text2
bunch of lines with text3
1:
2bunch of lines with text
2bunch of lines with text2
2:
3bunch of lines with text
3bunch of lines with text2
3bunch of lines with text3
3bunch of lines with text4

Edit:

I've finally come up with a solution which lazily generates the Stream:

public static Stream<String> lines(Path path, String delimiter) throws IOException {
    Stream<String> lines = Files.lines(path);
    Iterator<String> iterator = lines.iterator();
    return StreamSupport.stream(Spliterators.spliteratorUnknownSize(new Iterator<String>() {
        String nextLine;

        @Override
        public boolean hasNext() {
            if (nextLine != null) {
                return true;
            }
            while (iterator.hasNext()) {
                String line = iterator.next();
                if (!delimiter.equals(line)) {
                    nextLine = line;
                    return true;
                }
            }
            lines.close();
            return false;
        }

        @Override
        public String next() {
            if (!hasNext()) {
                throw new NoSuchElementException();
            }
            StringBuilder sb = new StringBuilder(nextLine);
            nextLine = null;
            while (iterator.hasNext()) {
                String line = iterator.next();
                if (delimiter.equals(line)) {
                    break;
                }
                sb.append('\n').append(line);
            }
            return sb.toString();
        }
    }, Spliterator.ORDERED | Spliterator.NONNULL | Spliterator.IMMUTABLE), false);
}

This is actually/coincidentally very similar to the implementation of BufferedReader.lines() (which is internally used by Files.lines(Path)). It may be less overhead not to use both of these methods but instead use Files.newBufferedReader(Path) and BufferedReader.readLine() directly.

like image 138
xehpuk Avatar answered Oct 21 '22 22:10

xehpuk


You can use a Scanner as an iterator and create the stream from it:

private static Stream<String> recordStreamOf(Readable source) {
    Scanner scanner = new Scanner(source);
    scanner.useDelimiter("$$$$");
    return StreamSupport
        .stream(Spliterators.spliteratorUnknownSize(scanner, Spliterator.ORDERED | Spliterator.NONNULL), false)
        .onClose(scanner::close);
}

This will preserve the newlines in the chunks for further filtering or splitting.

like image 21
ArtGod Avatar answered Oct 21 '22 20:10

ArtGod