When performing the distinct operation in Spark,
Question: Where does the second level of distinct computation occur? Does it happen at the executor level or directly at the driver?
Sheer logic should tell you the answer (for a dataframe):
Image N partitions with col x. Steps are:
No need for Driver local aggregation or single Executor like for order by without grouping column.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With