How to return rows with Null values in pyspark dataframe?

Tags:

I am trying to get the rows with null values from a pyspark dataframe. In pandas, I can achieve this using isnull() on the dataframe:

Click to copy

df = df[df.isnull().any(axis=1)]

But in case of PySpark, when I am running below command it shows Attributeerror:

Click to copy

df.filter(df.isNull())

AttributeError: 'DataFrame' object has no attribute 'isNull'.

How can get the rows with null values without checking it for each column?

922

asked Nov 26 '18 18:11

dg S

1 Answers

You can filter the rows with where, reduce and a list comprehension. For example, given the following dataframe:

Click to copy

df = sc.parallelize([
    (0.4, 0.3),
    (None, 0.11),
    (9.7, None), 
    (None, None)
]).toDF(["A", "B"])

df.show()
+----+----+
|   A|   B|
+----+----+
| 0.4| 0.3|
|null|0.11|
| 9.7|null|
|null|null|
+----+----+

Filtering the rows with some null value could be achieved with:

Click to copy

import pyspark.sql.functions as f
from functools import reduce

df.where(reduce(lambda x, y: x | y, (f.col(x).isNull() for x in df.columns))).show()

Which gives:

Click to copy

+----+----+
|   A|   B|
+----+----+
|null|0.11|
| 9.7|null|
|null|null|
+----+----+

In the condition statement you have to specify if any (or, |), all (and, &), etc.

157

answered Oct 21 '22 11:10

Amanda

Related questions
                            
                                Basic auth authentication in Bottle
                            
                                Get percentages of a column based off of another column but with different categories
                            
                                List sort based on another shorter list
                            
                                File "<string>", line 1, in <module> NameError: name ' ' is not defined in ATOM [duplicate]
                            
                                Pandas: for all set of duplicate entries in a particular column, grab some information
                            
                                Pyinstaller generated exe doesn't work properly
                            
                                How to store %%time values in a variable in Jupyter? [duplicate]
                            
                                Django - Filter the prefetch_related queryset
                            
                                Error- AttributeError: 'DirectoryIterator' object has no attribute 'ndim in autoencoder design in keras
                            
                                How to connect to Odoo database from an android application
                            
                                Is there a faster alternative to np.diff?
                            
                                Why does Exception proxy __str__ onto the args?
                            
                                How to send python output to telegram CHANNEL not to Group and gmail email group
                            
                                How can i check that a list is in my array in python
                            
                                How to return a list of frequencies for a certain value in a dict
                            
                                In python, how do I invert a 2D dictionary?
                            
                                Error in Google Colaboratory - AttributeError: module 'PIL.Image' has no attribute 'register_decoder'
                            
                                Pandas: Enumerate duplicates in index
                            
                                Python "in" and "==" confusion
                            
                                Log Python Systemd output to log file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to return rows with Null values in pyspark dataframe?

Tags:

python

apache-spark-sql

pyspark

dg S

People also ask

1 Answers

Amanda

Recent Activity

Donate For Us