Suppose we have a simple dataframe: <pre class="prettyprint"><code>from pyspark.sql.types import * schema = StructType([ StructField('id', LongType(), False), StructField('name', StringType(), False), StructField('count', LongType(), True), ]) df = spark.createDataFrame([(1,'Alice',None), (2,'Bob',1)], schema) </code></pre> The question is how to detect null values? I tried the following: <pre class="prettyprint"><code>df.where(df.count == None).show() df.where(df.count is 'null').show() df.where(df.count == 'null').show() </code></pre> It results in error: <pre class="prettyprint"><code>condition should be string or Column </code></pre> I know the following works: <pre class="prettyprint"><code>df.where("count is null").show() </code></pre> But is there a way to achieve with without the full string? I.e. <code>df.count</code>...?

You can use Spark Function <code>isnull</code> <pre class="prettyprint"><code>from pyspark.sql import functions as F df.where(F.isnull(F.col("count"))).show() </code></pre> or directly with the method <code>isNull</code> <pre class="prettyprint"><code>df.where(F.col("count").isNull()).show() </code></pre>

Another way of doing the same is by using <code>filter</code> api <pre class="prettyprint"><code>from pyspark.sql import functions as F df.filter(F.isnull("count")).show() </code></pre>

How to filter null values in pyspark dataframe?

from pyspark.sql.types import *

schema = StructType([
StructField('id', LongType(), False),
StructField('name', StringType(), False),
StructField('count', LongType(), True),
])
df = spark.createDataFrame([(1,'Alice',None), (2,'Bob',1)], schema)

The question is how to detect null values? I tried the following:

df.where(df.count == None).show()
df.where(df.count is 'null').show()
df.where(df.count == 'null').show()

It results in error:

condition should be string or Column

I know the following works:

df.where("count is null").show()

But is there a way to achieve with without the full string? I.e. df.count...?

872

asked Dec 28 '17 13:12

Miroslav Stola

2 Answers

You can use Spark Function isnull

from pyspark.sql import functions as F
df.where(F.isnull(F.col("count"))).show()

or directly with the method isNull

df.where(F.col("count").isNull()).show()

173

answered Oct 24 '22 15:10

Steven

Another way of doing the same is by using filter api

from pyspark.sql import functions as F
df.filter(F.isnull("count")).show()

answered Oct 24 '22 15:10

Ramesh Maharjan

Related questions
                            
                                Angular filter and order elements on click
                            
                                how to use filter in grails
                            
                                Get posts from a Facebook page by date
                            
                                How to restrict access to certain actions in controller in ASP.net MVC
                            
                                jQuery DataTables: Multiple checkbox filtering
                            
                                How to filter out rows of one python pandas dataframe from another dataframe by comparing columns?
                            
                                Understanding java 8 stream's filter method
                            
                                Filter file with awk and keep header in output
                            
                                Get Indexes of Filtered Array Items
                            
                                sql teradata filtering on date - database version Teradata 15.10.06.02 and provider version Teradata.Net 15.11.0.0
                            
                                FILTER_SANITIZE_SPECIAL_CHARS problem with line breaks
                            
                                How to filter Django's CommaSeparatedIntegerField
                            
                                How to filter array values from another arrays values and return new array?
                            
                                Makefile: How to apply an equivalent to filter on multiple wildcards
                            
                                different results using imfilter and conv2
                            
                                Derivative of Gaussian filter in Matlab
                            
                                angularjs ng-repeat filter based on array length
                            
                                Filtering based on the last two current values in Java rx
                            
                                What is the Laplacian mask/kernel used in the scipy.ndimage.filter.laplace()?
                            
                                Built-in function scipy.signal.savgol_filter returns error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to filter null values in pyspark dataframe?

Tags:

null

filter

pyspark

Miroslav Stola

People also ask

2 Answers

Steven

Ramesh Maharjan

Recent Activity

Donate For Us