Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perform multiple unrelated operations on elements of a single stream in Java

How can I perform multiple unrelated operations on elements of a single stream?

Say I have a List<String> composed from a text. Each string in the list may or may not contain a certain word, which represents an action to perform. Let's say that:

  • if the string contains 'of', all the words in that string must be counted
  • if the string contains 'for', the portion after the first occurrence of 'for' must be returned, yielding a List<String> with all substrings

Of course, I could do something like this:

List<String> strs = ...;

List<Integer> wordsInStr = strs.stream()
    .filter(t -> t.contains("of"))
    .map(t -> t.split(" ").length)
    .collect(Collectors.toList());

List<String> linePortionAfterFor = strs.stream()
    .filter(t -> t.contains("for"))
    .map(t -> t.substring(t.indexOf("for")))
    .collect(Collectors.toList());

but then the list would be traversed twice, which could result in a performance penalty if strs contained lots of elements.

Is it possible to somehow execute those two operations without traversing twice over the list?

like image 390
MC Emperor Avatar asked Oct 03 '17 07:10

MC Emperor


People also ask

When performing operations on a stream it will affect the original stream?

A stream can be composed of multiple functions that create a pipeline that data that flows through. This data cannot be mutated. That is to say the original data structure doesn't change. However the data can be transformed and later stored in another data structure or perhaps consumed by another operation.

How to split a stream into two?

If we want to split a stream in two, we can use partitioningBy from the Collectors class. It takes a Predicate and returns a Map that groups elements that satisfied the predicate under the Boolean true key and the rest under false.

Is count() a terminal operation?

count() is a default method present in the Stream Interface that will return the number of elements in the stream. Because the count() method is a terminal operation, the stream cannot be accessed after the count() method is called.

Is count a terminal operation in stream?

The Java Stream count() method is a terminal operation which starts the internal iteration of the elements in the Stream , and counts the elements.


3 Answers

If you want a single pass Stream then you have to use a custom Collector (parallelization possible).

class Splitter {
  public List<String> words = new ArrayList<>();
  public List<Integer> counts = new ArrayList<>();

  public void accept(String s) {
    if(s.contains("of")) {
      counts.add(s.split(" ").length);
    } else if(s.contains("for")) {
      words.add(s.substring(s.indexOf("for")));
    }
  }

  public Splitter merge(Splitter other) {
    words.addAll(other.words);
    counts.addAll(other.counts);
    return this;
  }
}
Splitter collect = strs.stream().collect(
  Collector.of(Splitter::new, Splitter::accept, Splitter::merge)
);
System.out.println(collect.counts);
System.out.println(collect.words);
like image 159
Flown Avatar answered Nov 15 '22 08:11

Flown


Here is the answer to address the OP from a different aspect. First of all, let's take a look how fast/slow to iterate a list/collection. Here is the test result on my machine by the below performance test:

When: length of string list = 100, Thread number = 1, loops = 1000, unit = milliseconds


OP: 0.013

Accepted answer: 0.020

By the counter function: 0.010


When: length of string list = 1000_000, Thread number = 1, loops = 100, unit = milliseconds


OP: 99.387

Accepted answer: 89.848

By the counter function: 59.183


Conclusion: The percentage of performance improvement is pretty small or even slower(if the length of string list is small). generally, it's a mistake to reduce the iteration of list/collection which is loaded in memory by the more complicate collector. you won't get much performance improvements. we should look into somewhere else if there is a performance issue.

Here is my performance test code with tool Profiler: (I'm not going to discuss how to do a performance test here. if you doubt the test result, you can do it again with any tool you believe in)

@Test
public void test_46539786() {
    final int strsLength = 1000_000;
    final int threadNum = 1;
    final int loops = 100;
    final int rounds = 3;

    final List<String> strs = IntStream.range(0, strsLength).mapToObj(i -> i % 2 == 0 ? i + " of " + i : i + " for " + i).toList();

    Profiler.run(threadNum, loops, rounds, "OP", () -> {
        List<Integer> wordsInStr = strs.stream().filter(t -> t.contains("of")).map(t -> t.split(" ").length).collect(Collectors.toList());
        List<String> linePortionAfterFor = strs.stream().filter(t -> t.contains("for")).map(t -> t.substring(t.indexOf("for")))
                .collect(Collectors.toList());

        assertTrue(wordsInStr.size() == linePortionAfterFor.size());
    }).printResult();

    Profiler.run(threadNum, loops, rounds, "Accepted answer", () -> {
        Splitter collect = strs.stream().collect(Collector.of(Splitter::new, Splitter::accept, Splitter::merge));
        assertTrue(collect.counts.size() == collect.words.size());
    }).printResult();

    final Function<String, Integer> counter = s -> {
        int count = 0;
        for (int i = 0, len = s.length(); i < len; i++) {
            if (s.charAt(i) == ' ') {
                count++;
            }
        }
        return count;
    };

    Profiler.run(threadNum, loops, rounds, "By the counter function", () -> {
        List<Integer> wordsInStr = strs.stream().filter(t -> t.contains("of")).map(counter).collect(Collectors.toList());
        List<String> linePortionAfterFor = strs.stream().filter(t -> t.contains("for")).map(t -> t.substring(t.indexOf("for")))
                .collect(Collectors.toList());

        assertTrue(wordsInStr.size() == linePortionAfterFor.size());
    }).printResult();
}
like image 24
123-xyz Avatar answered Nov 15 '22 07:11

123-xyz


You could use a custom collector for that and iterate only once:

 private static <T, R> Collector<String, ?, Pair<List<String>, List<Long>>> multiple() {

    class Acc {

        List<String> strings = new ArrayList<>();

        List<Long> longs = new ArrayList<>();

        void add(String elem) {
            if (elem.contains("of")) {
                long howMany = Arrays.stream(elem.split(" ")).count();
                longs.add(howMany);
            }
            if (elem.contains("for")) {
                String result = elem.substring(elem.indexOf("for"));
                strings.add(result);
            }

        }

        Acc merge(Acc right) {
            longs.addAll(right.longs);
            strings.addAll(right.strings);
            return this;
        }

        public Pair<List<String>, List<Long>> finisher() {
            return Pair.of(strings, longs);
        }

    }
    return Collector.of(Acc::new, Acc::add, Acc::merge, Acc::finisher);
}

Usage would be:

Pair<List<String>, List<Long>> pair = Stream.of("t of r m", "t of r m", "nice for nice nice again")
            .collect(multiple());
like image 26
Eugene Avatar answered Nov 15 '22 08:11

Eugene