I want to replace null values in one column with the values in an adjacent column ,for example if i have <pre class="prettyprint"><code>A|B 0,1 2,null 3,null 4,2 </code></pre> I want it to be: <pre class="prettyprint"><code>A|B 0,1 2,2 3,3 4,2 </code></pre> Tried with <pre class="prettyprint"><code>df.na.fill(df.A,"B") </code></pre> But didnt work, it says value should be a float, int, long, string, or dict Any ideas?

We can use coalesce <pre class="prettyprint"><code>from pyspark.sql.functions import coalesce df.withColumn("B",coalesce(df.B,df.A)) </code></pre>

PySpark replace null in column with value in other column

Tags:

python

apache-spark

pyspark

I want to replace null values in one column with the values in an adjacent column ,for example if i have

A|B 0,1 2,null 3,null 4,2

I want it to be:

A|B 0,1 2,2 3,3 4,2

Tried with

df.na.fill(df.A,"B")

But didnt work, it says value should be a float, int, long, string, or dict

Any ideas?

456

asked Mar 24 '17 02:03

Luis Leal

1 Answers

We can use coalesce

from pyspark.sql.functions import coalesce      df.withColumn("B",coalesce(df.B,df.A))

102

answered Sep 18 '22 06:09

Luis Leal

Related questions
                            
                                What is first class function in Python
                            
                                Tensorflow Confusion Matrix in TensorBoard
                            
                                Can I pass arguments to pytest fixtures?
                            
                                Sorting a defaultdict by value in python
                            
                                Python - dump dict as a json string
                            
                                Is it possible to remove a break point set with ipdb.set_trace()?
                            
                                Exception Handling in Pandas .apply() function
                            
                                TypeError when converting dictionary to JSON array
                            
                                Update multiple objects at once in Django?
                            
                                Convert python 2 code to 3 in PyCharm
                            
                                RemoveError: 'setuptools' is a dependency of conda and cannot be removed from conda's operating environment
                            
                                AttributeError: module 'tensorflow.python.keras.utils.generic_utils' has no attribute 'populate_dict_with_module_objects'
                            
                                Django and urls.py: How do I HttpResponseRedirect via a named url?
                            
                                Python client library for WebDAV
                            
                                QListWidget and Multiple Selection
                            
                                Counting depth or the deepest level a nested list goes to
                            
                                How to convert a python datetime.datetime to excel serial date number
                            
                                Python: Elegant and efficient ways to mask a list
                            
                                Are there default icons in PyQt/PySide?
                            
                                Add n business days to a given date ignoring holidays and weekends in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PySpark replace null in column with value in other column

Tags:

python

apache-spark

pyspark

Luis Leal

People also ask

1 Answers

Luis Leal

Recent Activity

Donate For Us