Is there a way to figure out in advance (not by trial and error) whether a specific query should use GROUP BY or GROUP EACH BY? We currently saw that after a cardinality of ~60-70% we are asked to use Group EACH by. It is hard to predict as we generate the SQL.
The usage of 'EACH' doesn't depend on the query, but on the data. Is there a small number of unique values for the group expression? Use GROUP BY. Is there a lot? Use GROUP EACH BY.
The best strategy is to use GROUP BY until you get an "over limits error".
To go deeper into the "why?", you can look at the Dremel paper that started it all. Basically GROUP BY runs in the mixers, while GROUP EACH BY gets pushed to the shards.
For other insights, check jcondit's answers at Resources Exceeded during query execution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With