Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark dataframe calculate the row-wise minimum [duplicate]

I'm trying to put the minimum value of a few columns into a separate column. (Creating the min column). The operation is pretty straight forward but I wasn't able to find the right function for that:
A B min
1 2 1
2 1 1
3 1 1
1 4 1

Thanks a lot for your help!

like image 384
Bubble Bubble Bubble Gut Avatar asked Nov 16 '25 18:11

Bubble Bubble Bubble Gut


1 Answers

You can use the least function, in pyspark:

from pyspark.sql.functions import least
df.withColumn('min', least('A', 'B')).show()
#+---+---+---+
#|  A|  B|min|
#+---+---+---+
#|  1|  2|  1|
#|  2|  1|  1|
#|  3|  1|  1|
#|  1|  4|  1|
#+---+---+---+

If you have a list of column names:

cols = ['A', 'B']
df.withColumn('min', least(*cols))

Similarly in Scala:

import org.apache.spark.sql.functions.least
df.withColumn("min", least($"A", $"B")).show
+---+---+---+
|  A|  B|min|
+---+---+---+
|  1|  2|  1|
|  2|  1|  1|
|  3|  1|  1|
|  1|  4|  1|
+---+---+---+

If the columns are stored in a Seq:

val cols = Seq("A", "B")    
df.withColumn("min", least(cols.head, cols.tail: _*))
like image 98
Psidom Avatar answered Nov 19 '25 09:11

Psidom



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!