I have a text file that contains multiple reports in it. Each report starts with a literal "REPORT ID" and have a specific value i.e ABCD. For simple case, I want to extract data of only those reports which have their value ABCD for example. And for complexity, I want to extract data of only those reports which have TAG1 value (2nd line)as 1000375351 and report value is same as ABCD.
I have done it using traditional way. My decideAndExtract(String line)
function have the required logic. But how can I use Java 9 streams takeWhile and dropWhile methods to efficiently deal with it?
try (Stream<String> lines = Files.lines(filePath)) {
lines.forEach(this::decideAndExtract);
}
Sample text file data:
REPORT ID: ABCD
TAG1: 1000375351 PR
DATA1: 7399910002 T
DATA2: 4754400002 B
DATA3 : 1000640
Some Lines Here
REPORT ID: WXYZ
TAG1: 1000375351 PR
DATA1: 7399910002 T
DATA2: 4754400002 B
DATA3 : 1000640
Some Lines Here
REPORT ID: ABCD
TAG1: 1000375351 PR
DATA1: 7399910002 T
DATA2: 4754400002 B
DATA3 : 1000640
Some Lines Here
As part of Java 9, the Stream interface has two methods takeWhile and dropWhile. This post will explore the Java Stream takeWhile and dropWhile methods. A takeWhile operation is very useful for an infinite stream.
The takeWhile(java.util.function.Predicate) method returns a stream of the remaining elements of this stream after taken the longest prefix of elements that match the given predicate if the stream is ordered else a stream of a subset of elements taken from this stream that match the given predicate.
Before Java 9, there was no way to pick or drop a subset/subsequence of a stream based on a condition. As part of Java 9, the Stream interface has two methods takeWhile and dropWhile. This post will explore the Java Stream takeWhile and dropWhile methods.
dropWhile is the opposite of takeWhile. dropWhile drops the elements which is matching to the predicate instead of taking them as takeWhile. And, whenever it reaches to the element which does not match the predicate, it includes the remaining elements in the returned stream.
It seems to be a common anti-pattern to go for Files.lines
, whenever a Stream
over a file is needed, regardless of whether processing individual lines is actually needed.
The first tool of your choice, when pattern matching over a file is needed, should be Scanner
:
Pattern p = Pattern.compile(
"REPORT ID: ABCD\\s*\\R"
+"TAG1\\s*:\\s*(.*?)\\R"
+"DATA1\\s*:\\s*(.*?)\\R"
+"DATA2\\s*:\\s*(.*?)\\R"
+"DATA3\\s*:\\s*(.*?)\\R"); // you can keep this in a static final field
try(Scanner sc = new Scanner(filePath, StandardCharsets.UTF_8);
Stream<MatchResult> st = sc.findAll(p)) {
st.forEach(mr -> System.out.println("found tag1: " + mr.group(1)
+ ", data: "+String.join(", ", mr.group(2), mr.group(3), mr.group(4))));
}
It's easy to adapt the pattern, i.e. use
Pattern p = Pattern.compile(
"REPORT ID: ABCD\\s*\\R"
+"TAG1: (1000375351 PR)\\R"
+"DATA1\\s*:\\s*(.*?)\\R"
+"DATA2\\s*:\\s*(.*?)\\R"
+"DATA3\\s*:\\s*(.*?)\\R"); // you can keep this in a static final field
as pattern to fulfill your more complex criteria.
But you could also provide arbitrary filter conditions in the Stream:
Pattern p = Pattern.compile(
"REPORT ID: (.*?)\\s*\\R"
+"TAG1: (.*?)\\R"
+"DATA1\\s*:\\s*(.*?)\\R"
+"DATA2\\s*:\\s*(.*?)\\R"
+"DATA3\\s*:\\s*(.*?)\\R"); // you can keep this in a static final field
try(Scanner sc = new Scanner(filePath, StandardCharsets.UTF_8);
Stream<MatchResult> st = sc.findAll(p)) {
st.filter(mr -> mr.group(1).equals("ABCD") && mr.group(2).equals("1000375351 PR"))
.forEach(mr -> System.out.println(
"found data: " + String.join(", ", mr.group(3), mr.group(4), mr.group(5))));
}
allowing more complex constructs than the equals
calls of the example. (Note that the group numbers changed for this example.)
E.g., to support a variable order of the data items after the “REPORT ID”, you can use
Pattern p = Pattern.compile("REPORT ID: (.*?)\\s*\\R(((TAG1|DATA[1-3])\\s*:.*?\\R){4})");
Pattern nl = Pattern.compile("\\R"), sep = Pattern.compile("\\s*:\\s*");
try(Scanner sc = new Scanner(filePath, StandardCharsets.UTF_8);
Stream<MatchResult> st = sc.findAll(p)) {
st.filter(mr -> mr.group(1).equals("ABCD"))
.map(mr -> nl.splitAsStream(mr.group(2))
.map(s -> sep.split(s, 2))
.collect(Collectors.toMap(a -> a[0], a -> a[1])))
.filter(map -> "1000375351 PR".equals(map.get("TAG1")))
.forEach(map -> System.out.println("found data: " + map));
}
findAll
is available in Java 9, but if you have to support Java 8, you can use the findAll
implementation of this answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With