Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PySpark: Absolute value of a column. TypeError: a float is required

I have a dataframe df created as follow,

schema = StructType([StructField('Id', StringType(), False),
                     StructField('Value', FloatType(), False)])  
df = spark.createDataFrame([('a',5.0),('b',1.0),('c',-0.3)],schema)

It looks like

+---+-----+
| Id|Value|
+---+-----+
|  a|  5.0|
|  b|  1.0|
|  c| -0.3|
+---+-----+

Now I want to take absolute value of Value, which should return

+---+-----+
| Id|Value|
+---+-----+
|  a|  5.0|
|  b|  1.0|
|  c|  0.3|
+---+-----+

I've tried

df = df.withColumn('Value',math.fabs(df.Value))

But it complains TypeError: a float is required. However Value column was specified with FloatType().

Any clue on how to correctly do this? Thanks!

like image 408
Yuehan Lyu Avatar asked May 18 '17 12:05

Yuehan Lyu


1 Answers

You can use the native Spark function abs():

from  pyspark.sql.functions import abs

df1 = df.withColumn('Value',abs(df.Value))
df1.show()
+---+-----+
| Id|Value|
+---+-----+
|  a|  5.0|
|  b|  1.0|
|  c|  0.3|
+---+-----+
like image 184
mtoto Avatar answered Oct 10 '22 20:10

mtoto