Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the asymptotic complexity of GroupBy operation?

I am interested in the asymptotic complexity (big O) of the GroupBy operation on unindexed datasets. What's the complexity of the best known algorithm and what's the complexity for algorithms that SQL servers and LINQ are using?

like image 433
Jakub Šturc Avatar asked Feb 03 '11 17:02

Jakub Šturc


3 Answers

Ignoring the base SQL that the group by is working on, when presented to the GROUP BY operation itself, the complexity is just O(n) since the data is scanned per-row and aggregated in one pass. It scales linearly to n (the size of the dataset).

When Group By is added to a complex query the equation changes, O(n) becomes the upper bound that the Group By adds to the overall equation; it could be less if the inner complex query is such that in the resolution of the base query, the data is already sorted.

like image 124
RichardTheKiwi Avatar answered Nov 02 '22 15:11

RichardTheKiwi


Grouping can be done in one pass (n complexity) on sorted rows (nlog(n) complexity) so complexity of group by is nlog(n) where n is number of rows. If there are indices for each column used in group by statement, the sorting is not necessary and the complexity is n.

like image 4
JosefN Avatar answered Nov 02 '22 13:11

JosefN


About Linq, I guess you want to know about the Linq-to-object group by complexity (Enumerable.GroupBy).

Checking the implementation with ILSpy, it appears to me it is O(n). (.Net Framework 4 series.)

It enumerates the source collection once. For each element, it computes its grouping key. Then it checks if it has already the key in a hashtable mapping to elements lists, adding the key to the hashtable if it is missing. Then it adds the element to the corresponding entry list in the hashtable.

like image 4
Frédéric Avatar answered Nov 02 '22 14:11

Frédéric