I want to replace null values in one column with the values in an adjacent column ,for example if i have
A|B 0,1 2,null 3,null 4,2
I want it to be:
A|B 0,1 2,2 3,3 4,2
Tried with
df.na.fill(df.A,"B")
But didnt work, it says value should be a float, int, long, string, or dict
Any ideas?
In PySpark, DataFrame. fillna() or DataFrameNaFunctions. fill() is used to replace NULL/None values on all or selected multiple DataFrame columns with either zero(0), empty string, space, or any constant literal values.
The ISNULL Function is a built-in function to replace nulls with specified replacement values. To use this function, all you need to do is pass the column name in the first parameter and in the second parameter pass the value with which you want to replace the null value.
fillna() function was introduced in Spark version 1.3. 1 and is used to replace null values with another specified value. It accepts two parameters namely value and subset . value corresponds to the desired value you want to replace nulls with.
By using expr() and regexp_replace() you can replace column value with a value from another DataFrame column. In the below example, we match the value from col2 in col1 and replace with col3 to create new_column . Use expr() to provide SQL like expressions and is used to refer to another column to perform operations.
We can use coalesce
from pyspark.sql.functions import coalesce df.withColumn("B",coalesce(df.B,df.A))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With