I'm trying to collect stream throwing away rarely used items like in this example: <pre class="prettyprint"><code>import java.util.*; import java.util.function.Function; import static java.util.stream.Collectors.*; import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.containsInAnyOrder; import org.junit.Test; @Test public void shouldFilterCommonlyUsedWords() { // given List<String> allWords = Arrays.asList( "call", "feel", "call", "very", "call", "very", "feel", "very", "any"); // when Set<String> commonlyUsed = allWords.stream() .collect(groupingBy(Function.identity(), counting())) .entrySet().stream().filter(e -> e.getValue() > 2) .map(Map.Entry::getKey).collect(toSet()); // then assertThat(commonlyUsed, containsInAnyOrder("call", "very")); } </code></pre> I have a feeling that it is possible to do it much simpler - am I right?

There is no way around creating a <code>Map</code>, unless you want accept a very high CPU complexity. However, you can remove the second <code>collect</code> operation: <pre class="prettyprint"><code>Map<String,Long> map = allWords.stream() .collect(groupingBy(Function.identity(), HashMap::new, counting())); map.values().removeIf(l -> l<=2); Set<String> commonlyUsed=map.keySet(); </code></pre> Note that in Java 8, <code>HashSet</code> still wraps a <code>HashMap</code>, so using the <code>keySet()</code> of a <code>HashMap</code>, when you want a <code>Set</code> in the first place, doesn’t waste space given the current implementation. Of course, you can hide the post-processing in a <code>Collector</code> if that feels more “streamy”: <pre class="prettyprint"><code>Set<String> commonlyUsed = allWords.stream() .collect(collectingAndThen( groupingBy(Function.identity(), HashMap::new, counting()), map-> { map.values().removeIf(l -> l<=2); return map.keySet(); })); </code></pre>

Collect stream with grouping, counting and filtering operations

Tags:

java

java-8

java-stream

I'm trying to collect stream throwing away rarely used items like in this example:

import java.util.*;
import java.util.function.Function;
import static java.util.stream.Collectors.*;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.containsInAnyOrder;
import org.junit.Test;

@Test
public void shouldFilterCommonlyUsedWords() {
    // given
    List<String> allWords = Arrays.asList(
       "call", "feel", "call", "very", "call", "very", "feel", "very", "any");

    // when
    Set<String> commonlyUsed = allWords.stream()
            .collect(groupingBy(Function.identity(), counting()))
            .entrySet().stream().filter(e -> e.getValue() > 2)
            .map(Map.Entry::getKey).collect(toSet());

    // then
    assertThat(commonlyUsed, containsInAnyOrder("call", "very"));
}

I have a feeling that it is possible to do it much simpler - am I right?

811

asked May 28 '15 19:05

ytterrr

2 Answers

There is no way around creating a Map, unless you want accept a very high CPU complexity.

However, you can remove the second collect operation:

Map<String,Long> map = allWords.stream()
    .collect(groupingBy(Function.identity(), HashMap::new, counting()));
map.values().removeIf(l -> l<=2);
Set<String> commonlyUsed=map.keySet();

Note that in Java 8, HashSet still wraps a HashMap, so using the keySet() of a HashMap, when you want a Set in the first place, doesn’t waste space given the current implementation.

Of course, you can hide the post-processing in a Collector if that feels more “streamy”:

Set<String> commonlyUsed = allWords.stream()
    .collect(collectingAndThen(
        groupingBy(Function.identity(), HashMap::new, counting()),
        map-> { map.values().removeIf(l -> l<=2); return map.keySet(); }));

197

answered Sep 27 '22 19:09

Holger

A while ago I wrote an experimental distinct(atLeast) method for my library:

public StreamEx<T> distinct(long atLeast) {
    if (atLeast <= 1)
        return distinct();
    AtomicLong nullCount = new AtomicLong();
    ConcurrentHashMap<T, Long> map = new ConcurrentHashMap<>();
    return filter(t -> {
        if (t == null) {
            return nullCount.incrementAndGet() == atLeast;
        }
        return map.merge(t, 1L, (u, v) -> (u + v)) == atLeast;
    });
}

So the idea was to use it like this:

Set<String> commonlyUsed = StreamEx.of(allWords).distinct(3).toSet();

This performs a stateful filtration, which looks a little bit ugly. I doubted whether such feature is useful thus I did not merge it into the master branch. Nevertheless it does the job in single stream pass. Probably I should revive it. Meanwhile you can copy this code into the static method and use it like this:

Set<String> commonlyUsed = distinct(allWords.stream(), 3).collect(Collectors.toSet());

Update (2015/05/31): I added the distinct(atLeast) method to the StreamEx 0.3.1. It's implemented using custom spliterator. Benchmarks showed that this implementation is significantly faster for sequential streams than stateful filtering described above and in many cases it's also faster than other solutions proposed in this topic. Also it works nicely if null is encountered in the stream (the groupingBy collector doesn't support null as class, thus groupingBy-based solutions will fail if null is encountered).

answered Sep 27 '22 18:09

Tagir Valeev

Related questions
                            
                                Where is object's hash code stored if biased-locking is enabled in HotSpot JVM?
                            
                                How does reflection and immutability supposed to work together
                            
                                How to annotate deprecation of a class in Java?
                            
                                Can I use @PostConstruct on an interface method?
                            
                                Generics Oddity - I can insert a Long value into a Map<String, String> and it compiles and doesn't fail at runtime
                            
                                Selenium WebDriver jQuery
                            
                                How to fix the Eclipse executable launcher was unable to locate its companion shared library for windows 7?
                            
                                Service will not start: error 1067: the process terminated unexpectedly
                            
                                Jackson list deserialization. nested Lists
                            
                                Is it possible to reasonably emulate yield-syntax, perhaps with help of Java 8?
                            
                                Using AssertionError and assertions in java
                            
                                Dynamically add items to list view using custom adapter for Android app
                            
                                java.sql.SQLException: Connections could not be acquired from the underlying database
                            
                                Why Android uses Ints instead of Enums
                            
                                How to swipe to delete a Card (using appcompat v7's CardView)
                            
                                Retrofit: How to send a POST request with constant fields?
                            
                                Java Annotations Reflection Ordering
                            
                                BigInteger.isProbablePrime
                            
                                Java CertificateException "No subject alternative names matching IP address ... found"
                            
                                How and Where to use Bit Mask in java [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With