Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting Map<K, Multiset<V>> from Stream of Streams in Java 8

I have Stream of Stream of Words(This format is not set by me and cannot be changed). For ex

Stream<String> doc1 = Stream.of("how", "are", "you", "doing", "doing", "doing");
Stream<String> doc2 = Stream.of("what", "what", "you", "upto");
Stream<String> doc3 = Stream.of("how", "are", "what", "how");
Stream<Stream<String>> docs = Stream.of(doc1, doc2, doc3);

I'm trying to get this into a structure of Map<String, Multiset<Integer>> (or its corresponding stream as I want to process this further), where the key String is the word itself and the Multiset<Integer> represents the number of that word appearances in each document (0's should be excluded). Multiset is a google guava class(not from java.util.).

For example:

how   -> {1, 2}  // because it appears once in doc1, twice in doc3 and none in doc2(so doc2's count should not be included)
are   -> {1, 1}  // once in doc1 and once in doc3
you   -> {1, 1}  // once in doc1 and once in doc2
doing -> {3}     // thrice in doc3, none in others 
what  -> {2,1}   // so on
upto  -> {1}  

What is a good way to do this in Java 8 ?

I tried using a flatMap , but the inner Stream is greatly limiting the options of I have.

like image 808
Anoop Avatar asked May 26 '17 18:05

Anoop


People also ask

Can we get a map from a stream in Java?

Method 1: Using Collectors.toMap() Function The Collectors. toMap() method takes two parameters as the input: KeyMapper: This function is used for extracting keys of the Map from stream value. ValueMapper: This function used for extracting the values of the map for the given key.

What is the purpose of MAP method of stream in Java 8?

Java 8 Stream's map method is intermediate operation and consumes single element forom input Stream and produces single element to output Stream. It simply used to convert Stream of one type to another.

What is MAP reduce in stream?

mapper : The reducing operation applies this mapper function to all stream elements. In this example, the mapper retrieves the age of each member. operation : The operation function is used to reduce the mapped values. In this example, the operation function adds Integer values.

What is map function in Java Stream?

Stream map() in Java with examples Stream map(Function mapper) returns a stream consisting of the results of applying the given function to the elements of this stream. Stream map(Function mapper) is an intermediate operation. These operations are always lazy.


3 Answers

 Map<String, List<Long>> map = docs.flatMap(
            inner -> inner.collect(
                    Collectors.groupingBy(Function.identity(), Collectors.counting()))
                    .entrySet()
                    .stream())
            .collect(Collectors.groupingBy(
                    Entry::getKey,
                    Collectors.mapping(Entry::getValue, Collectors.toList())));

System.out.println(map);

// {upto=[1], how=[1, 2], doing=[3], what=[2, 1], are=[1, 1], you=[1, 1]}
like image 128
Eugene Avatar answered Sep 24 '22 11:09

Eugene


Since you are using Guava, you could take advantage of its utilities to work with streams. Same with the Table structure. Here's the code:

Table<String, Long, Long> result =
    Streams.mapWithIndex(docs, (doc, i) -> doc.map(word -> new SimpleEntry<>(word, i)))
        .flatMap(Function.identity())
        .collect(Tables.toTable(
            Entry::getKey, Entry::getValue, p -> 1L, Long::sum, HashBasedTable::create));

Here I'm using the Streams.mapWithIndex method to assign an index to each inner stream. Within the map function, I'm transforming each word to a pair that consists of the word and the index, so that I can later know to which document the word belongs.

Then, I'm flat-mapping the pairs (word, index) of all documents to one stream, and finally, I'm collecting all the pairs to a Guava Table by means of the Tables.toTable collector. The row is the word, the column is the document (represented by the index) and the value is the count of words for each document (I'm assigning 1L to each different (word, index) pair and using Long::sum to merge collisions).

You have all the info you need in the result table, but if you still need a Map<String, Multiset<Integer>>, you could do it this way:

Map<String, Multiset<Long>> map = Maps.transformValues(
    result.rowMap(),
    m -> HashMultiset.create(m.values()));

Note: you need Guava 21 for this to work.

like image 40
fps Avatar answered Sep 21 '22 11:09

fps


Map<String, Multiset<Integer>> result = docs
        .map(s -> s.collect(Collectors.toCollection(HashMultiset::create)))
        .flatMap(m -> m.entrySet().stream())
        .collect(Collectors.groupingBy(Multiset.Entry::getElement,
                Collectors.mapping(Multiset.Entry::getCount,
                        Collectors.toCollection(HashMultiset::create))));

// {upto=[1], how=[1, 2], doing=[3], what=[1, 2], are=[1 x 2], you=[1 x 2]}

Multiset is useful for getting the word count, but not really necessary for storing the counts. If you're fine with Map<String, List<Integer>>, just replace the last line with Collectors.toList())));.

Or, since you're using Guava anyway, why not a ListMultimap?

ListMultimap<String, Integer> result = docs
        .map(s -> s.collect(Collectors.toCollection(HashMultiset::create)))
        .flatMap(m -> m.entrySet().stream())
        .collect(ArrayListMultimap::create,
                (r, e) -> r.put(e.getElement(), e.getCount()),
                Multimap::putAll);

// {upto=[1], how=[1, 2], doing=[3], what=[2, 1], are=[1, 1], you=[1, 1]}
like image 42
Sean Van Gorder Avatar answered Sep 21 '22 11:09

Sean Van Gorder