I have the following udf function in pyspark dataframe. The code works fine except when myFun1('oldColumn')
is null, I want the output to be empty string instead of null.
myFun1 = udf(lambda x: myModule.myFunction1(x), StringType())
myDF = myDF.withColumn('newColumn', myFun1('oldColumn'))
Is it possible to do this in place instead of create another udf function? Thanks!
Using df.fillna()
or df.na.fill()
to replace null values with an empty string worked for me.
You can do replacements by column by supplying the column and value you want to replace nulls with as a parameter:
myDF = myDF.na.fill({'oldColumn': ''})
The Pyspark docs have an example :
>>> df4.na.fill({'age': 50, 'name': 'unknown'}).show()
+---+------+-------+
|age|height| name|
+---+------+-------+
| 10| 80| Alice|
| 5| null| Bob|
| 50| null| Tom|
| 50| null|unknown|
+---+------+-------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With