Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark window function without orderBy

I have a DataFrame with columns a, b for which I want to partition the data by a using a window function, and then give unique indices for b

val window_filter = Window.partitionBy($"a").orderBy($"b".desc)
withColumn("uid", row_number().over(window_filter))

But for this use-case, ordering by b is unneeded and may be time consuming. How can I achieve this without ordering?

like image 932
DeanLa Avatar asked May 03 '26 22:05

DeanLa


1 Answers

row_number() without order by or with order by constant has non-deterministic behavior and may produce different results for the same rows from run to run due to parallel processing. The same may happen if the order by column does not change, the order of rows may be different from run to run and you will get different results.

like image 152
leftjoin Avatar answered May 05 '26 15:05

leftjoin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!