Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java 9 takeWhile and dropWhile to read and skip certain lines

I have a text file that contains multiple reports in it. Each report starts with a literal "REPORT ID" and have a specific value i.e ABCD. For simple case, I want to extract data of only those reports which have their value ABCD for example. And for complexity, I want to extract data of only those reports which have TAG1 value (2nd line)as 1000375351 and report value is same as ABCD.

I have done it using traditional way. My decideAndExtract(String line) function have the required logic. But how can I use Java 9 streams takeWhile and dropWhile methods to efficiently deal with it?

try (Stream<String> lines = Files.lines(filePath)) {
    lines.forEach(this::decideAndExtract);
}

Sample text file data:

REPORT ID: ABCD    
TAG1: 1000375351 PR
DATA1: 7399910002 T
DATA2: 4754400002 B
DATA3     : 1000640
Some Lines Here    
REPORT ID: WXYZ    
TAG1: 1000375351 PR
DATA1: 7399910002 T
DATA2: 4754400002 B
DATA3     : 1000640
Some Lines Here    
REPORT ID: ABCD    
TAG1: 1000375351 PR
DATA1: 7399910002 T
DATA2: 4754400002 B
DATA3     : 1000640
Some Lines Here
like image 706
Tishy Tash Avatar asked Aug 02 '19 19:08

Tishy Tash


People also ask

What are the Java stream takewhile and dropWhile methods?

As part of Java 9, the Stream interface has two methods takeWhile and dropWhile. This post will explore the Java Stream takeWhile and dropWhile methods. A takeWhile operation is very useful for an infinite stream.

What is the use of takewhile in Java?

The takeWhile(java.util.function.Predicate) method returns a stream of the remaining elements of this stream after taken the longest prefix of elements that match the given predicate if the stream is ordered else a stream of a subset of elements taken from this stream that match the given predicate.

How to pick or drop a subset/subsequence of a stream in Java?

Before Java 9, there was no way to pick or drop a subset/subsequence of a stream based on a condition. As part of Java 9, the Stream interface has two methods takeWhile and dropWhile. This post will explore the Java Stream takeWhile and dropWhile methods.

What is the difference between takewhile and dropWhile in JavaScript?

dropWhile is the opposite of takeWhile. dropWhile drops the elements which is matching to the predicate instead of taking them as takeWhile. And, whenever it reaches to the element which does not match the predicate, it includes the remaining elements in the returned stream.


1 Answers

It seems to be a common anti-pattern to go for Files.lines, whenever a Stream over a file is needed, regardless of whether processing individual lines is actually needed.

The first tool of your choice, when pattern matching over a file is needed, should be Scanner:

Pattern p = Pattern.compile(
    "REPORT ID: ABCD\\s*\\R"
   +"TAG1\\s*:\\s*(.*?)\\R"
   +"DATA1\\s*:\\s*(.*?)\\R"
   +"DATA2\\s*:\\s*(.*?)\\R"
   +"DATA3\\s*:\\s*(.*?)\\R"); // you can keep this in a static final field

try(Scanner sc = new Scanner(filePath, StandardCharsets.UTF_8);
    Stream<MatchResult> st = sc.findAll(p)) {

    st.forEach(mr -> System.out.println("found tag1: " + mr.group(1)
        + ", data: "+String.join(", ", mr.group(2), mr.group(3), mr.group(4))));
}

It's easy to adapt the pattern, i.e. use

Pattern p = Pattern.compile(
    "REPORT ID: ABCD\\s*\\R"
   +"TAG1: (1000375351 PR)\\R"
   +"DATA1\\s*:\\s*(.*?)\\R"
   +"DATA2\\s*:\\s*(.*?)\\R"
   +"DATA3\\s*:\\s*(.*?)\\R"); // you can keep this in a static final field

as pattern to fulfill your more complex criteria.

But you could also provide arbitrary filter conditions in the Stream:

Pattern p = Pattern.compile(
    "REPORT ID: (.*?)\\s*\\R"
   +"TAG1: (.*?)\\R"
   +"DATA1\\s*:\\s*(.*?)\\R"
   +"DATA2\\s*:\\s*(.*?)\\R"
   +"DATA3\\s*:\\s*(.*?)\\R"); // you can keep this in a static final field

try(Scanner sc = new Scanner(filePath, StandardCharsets.UTF_8);
    Stream<MatchResult> st = sc.findAll(p)) {

    st.filter(mr -> mr.group(1).equals("ABCD") && mr.group(2).equals("1000375351 PR"))
      .forEach(mr -> System.out.println(
          "found data: " + String.join(", ", mr.group(3), mr.group(4), mr.group(5))));
}

allowing more complex constructs than the equals calls of the example. (Note that the group numbers changed for this example.)

E.g., to support a variable order of the data items after the “REPORT ID”, you can use

Pattern p = Pattern.compile("REPORT ID: (.*?)\\s*\\R(((TAG1|DATA[1-3])\\s*:.*?\\R){4})");
Pattern nl = Pattern.compile("\\R"), sep = Pattern.compile("\\s*:\\s*");

try(Scanner sc = new Scanner(filePath, StandardCharsets.UTF_8);
    Stream<MatchResult> st = sc.findAll(p)) {

    st.filter(mr -> mr.group(1).equals("ABCD"))
      .map(mr -> nl.splitAsStream(mr.group(2))
          .map(s -> sep.split(s, 2))
          .collect(Collectors.toMap(a -> a[0], a -> a[1])))
      .filter(map -> "1000375351 PR".equals(map.get("TAG1")))
      .forEach(map -> System.out.println("found data: " + map));
}

findAll is available in Java 9, but if you have to support Java 8, you can use the findAll implementation of this answer.

like image 85
Holger Avatar answered Oct 11 '22 19:10

Holger