Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PySpark replace null in column with value in other column

I want to replace null values in one column with the values in an adjacent column ,for example if i have

A|B 0,1 2,null 3,null 4,2 

I want it to be:

A|B 0,1 2,2 3,3 4,2 

Tried with

df.na.fill(df.A,"B") 

But didnt work, it says value should be a float, int, long, string, or dict

Any ideas?

like image 456
Luis Leal Avatar asked Mar 24 '17 02:03

Luis Leal


People also ask

How do you replace NULL values in a column in PySpark?

In PySpark, DataFrame. fillna() or DataFrameNaFunctions. fill() is used to replace NULL/None values on all or selected multiple DataFrame columns with either zero(0), empty string, space, or any constant literal values.

How do you replace NULL values with other values?

The ISNULL Function is a built-in function to replace nulls with specified replacement values. To use this function, all you need to do is pass the column name in the first parameter and in the second parameter pass the value with which you want to replace the null value.

How do you replace NULL values with some other value or discard the rows with NULL values in Spark?

fillna() function was introduced in Spark version 1.3. 1 and is used to replace null values with another specified value. It accepts two parameters namely value and subset . value corresponds to the desired value you want to replace nulls with.

How do you replace a value with another value in PySpark DataFrame?

By using expr() and regexp_replace() you can replace column value with a value from another DataFrame column. In the below example, we match the value from col2 in col1 and replace with col3 to create new_column . Use expr() to provide SQL like expressions and is used to refer to another column to perform operations.


1 Answers

We can use coalesce

from pyspark.sql.functions import coalesce      df.withColumn("B",coalesce(df.B,df.A))  
like image 102
Luis Leal Avatar answered Sep 18 '22 06:09

Luis Leal