pyspark/dataframe: replace null with empty space

Question

I have the following udf function in pyspark dataframe. The code works fine except when myFun1('oldColumn') is null, I want the output to be empty string instead of null.

myFun1 = udf(lambda x: myModule.myFunction1(x), StringType())
myDF = myDF.withColumn('newColumn', myFun1('oldColumn'))

Is it possible to do this in place instead of create another udf function? Thanks!

scmz · Accepted Answer

Using df.fillna() or df.na.fill() to replace null values with an empty string worked for me.

You can do replacements by column by supplying the column and value you want to replace nulls with as a parameter:

myDF = myDF.na.fill({'oldColumn': ''})

The Pyspark docs have an example :

>>> df4.na.fill({'age': 50, 'name': 'unknown'}).show()
+---+------+-------+
|age|height|   name|
+---+------+-------+
| 10|    80|  Alice|
|  5|  null|    Bob|
| 50|  null|    Tom|
| 50|  null|unknown|
+---+------+-------+

pyspark/dataframe: replace null with empty space

Tags:

python

dataframe

lambda

Edamame

1 Answers

scmz

Recent Activity

Donate For Us

pyspark/dataframe: replace null with empty space

Tags:

python

dataframe

lambda

Edamame

1 Answers

scmz

Related questions

Recent Activity

Donate For Us