How can I perform multiple unrelated operations on elements of a single stream?
Say I have a List<String>
composed from a text. Each string in the list may or may not contain a certain word, which represents an action to perform. Let's say that:
List<String>
with all substringsOf course, I could do something like this:
List<String> strs = ...;
List<Integer> wordsInStr = strs.stream()
.filter(t -> t.contains("of"))
.map(t -> t.split(" ").length)
.collect(Collectors.toList());
List<String> linePortionAfterFor = strs.stream()
.filter(t -> t.contains("for"))
.map(t -> t.substring(t.indexOf("for")))
.collect(Collectors.toList());
but then the list would be traversed twice, which could result in a performance penalty if strs
contained lots of elements.
Is it possible to somehow execute those two operations without traversing twice over the list?
A stream can be composed of multiple functions that create a pipeline that data that flows through. This data cannot be mutated. That is to say the original data structure doesn't change. However the data can be transformed and later stored in another data structure or perhaps consumed by another operation.
If we want to split a stream in two, we can use partitioningBy from the Collectors class. It takes a Predicate and returns a Map that groups elements that satisfied the predicate under the Boolean true key and the rest under false.
count() is a default method present in the Stream Interface that will return the number of elements in the stream. Because the count() method is a terminal operation, the stream cannot be accessed after the count() method is called.
The Java Stream count() method is a terminal operation which starts the internal iteration of the elements in the Stream , and counts the elements.
If you want a single pass Stream
then you have to use a custom Collector
(parallelization possible).
class Splitter {
public List<String> words = new ArrayList<>();
public List<Integer> counts = new ArrayList<>();
public void accept(String s) {
if(s.contains("of")) {
counts.add(s.split(" ").length);
} else if(s.contains("for")) {
words.add(s.substring(s.indexOf("for")));
}
}
public Splitter merge(Splitter other) {
words.addAll(other.words);
counts.addAll(other.counts);
return this;
}
}
Splitter collect = strs.stream().collect(
Collector.of(Splitter::new, Splitter::accept, Splitter::merge)
);
System.out.println(collect.counts);
System.out.println(collect.words);
Here is the answer to address the OP from a different aspect. First of all, let's take a look how fast/slow to iterate a list/collection. Here is the test result on my machine by the below performance test:
When: length of string list = 100, Thread number = 1, loops = 1000, unit = milliseconds
OP: 0.013
Accepted answer: 0.020
By the counter function: 0.010
When: length of string list = 1000_000, Thread number = 1, loops = 100, unit = milliseconds
OP: 99.387
Accepted answer: 89.848
By the counter function: 59.183
Conclusion: The percentage of performance improvement is pretty small or even slower(if the length of string list is small). generally, it's a mistake to reduce the iteration of list/collection which is loaded in memory by the more complicate collector. you won't get much performance improvements. we should look into somewhere else if there is a performance issue.
Here is my performance test code with tool Profiler: (I'm not going to discuss how to do a performance test here. if you doubt the test result, you can do it again with any tool you believe in)
@Test
public void test_46539786() {
final int strsLength = 1000_000;
final int threadNum = 1;
final int loops = 100;
final int rounds = 3;
final List<String> strs = IntStream.range(0, strsLength).mapToObj(i -> i % 2 == 0 ? i + " of " + i : i + " for " + i).toList();
Profiler.run(threadNum, loops, rounds, "OP", () -> {
List<Integer> wordsInStr = strs.stream().filter(t -> t.contains("of")).map(t -> t.split(" ").length).collect(Collectors.toList());
List<String> linePortionAfterFor = strs.stream().filter(t -> t.contains("for")).map(t -> t.substring(t.indexOf("for")))
.collect(Collectors.toList());
assertTrue(wordsInStr.size() == linePortionAfterFor.size());
}).printResult();
Profiler.run(threadNum, loops, rounds, "Accepted answer", () -> {
Splitter collect = strs.stream().collect(Collector.of(Splitter::new, Splitter::accept, Splitter::merge));
assertTrue(collect.counts.size() == collect.words.size());
}).printResult();
final Function<String, Integer> counter = s -> {
int count = 0;
for (int i = 0, len = s.length(); i < len; i++) {
if (s.charAt(i) == ' ') {
count++;
}
}
return count;
};
Profiler.run(threadNum, loops, rounds, "By the counter function", () -> {
List<Integer> wordsInStr = strs.stream().filter(t -> t.contains("of")).map(counter).collect(Collectors.toList());
List<String> linePortionAfterFor = strs.stream().filter(t -> t.contains("for")).map(t -> t.substring(t.indexOf("for")))
.collect(Collectors.toList());
assertTrue(wordsInStr.size() == linePortionAfterFor.size());
}).printResult();
}
You could use a custom collector for that and iterate only once:
private static <T, R> Collector<String, ?, Pair<List<String>, List<Long>>> multiple() {
class Acc {
List<String> strings = new ArrayList<>();
List<Long> longs = new ArrayList<>();
void add(String elem) {
if (elem.contains("of")) {
long howMany = Arrays.stream(elem.split(" ")).count();
longs.add(howMany);
}
if (elem.contains("for")) {
String result = elem.substring(elem.indexOf("for"));
strings.add(result);
}
}
Acc merge(Acc right) {
longs.addAll(right.longs);
strings.addAll(right.strings);
return this;
}
public Pair<List<String>, List<Long>> finisher() {
return Pair.of(strings, longs);
}
}
return Collector.of(Acc::new, Acc::add, Acc::merge, Acc::finisher);
}
Usage would be:
Pair<List<String>, List<Long>> pair = Stream.of("t of r m", "t of r m", "nice for nice nice again")
.collect(multiple());
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With