Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Complexity of grouping in Java8

I would like to learn the time complexity of the given statement below.(In Java8)

list.stream().collect(groupingBy(...)); 

Any idea?

like image 970
FreeMan Avatar asked Nov 26 '16 21:11

FreeMan


People also ask

How to group by in java 8?

In Java 8, you retrieve the stream from the list and use a Collector to group them in one line of code. It's as simple as passing the grouping condition to the collector and it is complete. By simply modifying the grouping condition, you can create multiple groups.

How does Collector groupingBy work?

The groupingBy() method of Collectors class in Java are used for grouping objects by some property and storing results in a Map instance. In order to use it, we always need to specify a property by which the grouping would be performed. This method provides similar functionality to SQL's GROUP BY clause.

What is a flatMap in Java 8?

In Java 8 Streams, the flatMap() method applies operation as a mapper function and provides a stream of element values. It means that in each iteration of each element the map() method creates a separate new stream. By using the flattening mechanism, it merges all streams into a single resultant stream.

What are the Java 8 features introduce in interface concepts?

Some of the important Java 8 features are; forEach() method in Iterable interface. default and static methods in Interfaces. Functional Interfaces and Lambda Expressions.


1 Answers

There is no general answer to that question, as the time complexity depends on all operations. Since the stream has to be processed entirely, there is a base time complexity of O(n) that has to be multiplied by the costs of all operations done per element. This, assuming that the iteration costs itself are not worse than O(n), which is the case for most stream sources.

So, assuming no intermediate operations that affect the time complexity, the groupingBy has to evaluate the function for each element, which should be independent of other elements, so not affect the time complexity (regardless of how expensive it is, as the O(…) time complexity only tells us, how the time scales with large numbers of stream elements). Then, it will insert the element into a map, which might depend on the number of already contained elements. Without a custom Map supplier, the map’s type is unspecified, hence, no statement can be made here.

In practice, it’s reasonable to assume that the result will be some sort of hashing map with a net O(1) lookup complexity by default. So we have a net time complexity of O(n) for the grouping. Then, we have the downstream collector.

The default downstream collector is toList(), which produces an unspecified List type, so again, we can’t say anything about the costs of adding elements to it.

The current implementation produces an ArrayList, which has to perform copy operations when the capacity is exceeded, but since the capacity is raised by a factor each time, there is still a net complexity of O(n) for adding n elements. It’s reasonable to assume that future changes to the toList() implementation won’t make the costs worse than what we have today. So the time complexity of a default groupingBy collection is likely O(n).

If we use a custom Map collector with a custom downstream collector, the complexity depends on the average number of groups to number of elements per group ratio. The worst case would be the worst of either, the map’s lookup and the downstream collector’s element processing (times the number of elements), as we could have one group containing all items or each item being in its own group.

But usually, you are capable of predicting a bias for a particular grouping operation, so you would want to calculate a time complexity for that particular operation, instead of relying on a statement about all grouping operations in general.

like image 158
Holger Avatar answered Oct 04 '22 20:10

Holger