How to check if spark dataframe is empty?

People also ask

How do I check if a data frame is empty in spark?

Method 1: isEmpty() The isEmpty function of the DataFrame or Dataset returns true when the DataFrame is empty and false when it's not empty. If the dataframe is empty, invoking “isEmpty” might result in NullPointerException. Note : calling df. head() and df.

How do I check if a column is empty in spark?

Spark Find Count of Null, Empty String of a DataFrame Column. To find null or empty on a single column, simply use Spark DataFrame filter() with multiple conditions and apply count() action. The below example finds the number of records with null or empty for the name column.

Is DataFrame empty?

empty. True if NDFrame is entirely empty [no items], meaning any of the axes are of length 0. If NDFrame contains only NaNs, it is still not considered empty.

How do you check for null in PySpark?

In PySpark DataFrame you can calculate the count of Null, None, NaN & Empty/Blank values in a column by using isNull() of Column class & SQL functions isnan() count() and when().

For Spark 2.1.0, my suggestion would be to use head(n: Int) or take(n: Int) with isEmpty, whichever one has the clearest intent to you.

df.head(1).isEmpty
df.take(1).isEmpty

with Python equivalent:

len(df.head(1)) == 0  # or bool(df.head(1))
len(df.take(1)) == 0  # or bool(df.take(1))

Using df.first() and df.head() will both return the java.util.NoSuchElementException if the DataFrame is empty. first() calls head() directly, which calls head(1).head.

def first(): T = head()
def head(): T = head(1).head

head(1) returns an Array, so taking head on that Array causes the java.util.NoSuchElementException when the DataFrame is empty.

def head(n: Int): Array[T] = withAction("head", limit(n).queryExecution)(collectFromPlan)

So instead of calling head(), use head(1) directly to get the array and then you can use isEmpty.

take(n) is also equivalent to head(n)...

def take(n: Int): Array[T] = head(n)

And limit(1).collect() is equivalent to head(1) (notice limit(n).queryExecution in the head(n: Int) method), so the following are all equivalent, at least from what I can tell, and you won't have to catch a java.util.NoSuchElementException exception when the DataFrame is empty.

df.head(1).isEmpty
df.take(1).isEmpty
df.limit(1).collect().isEmpty

I know this is an older question so hopefully it will help someone using a newer version of Spark.

I would say to just grab the underlying RDD. In Scala:

df.rdd.isEmpty

in Python:

df.rdd.isEmpty()

That being said, all this does is call take(1).length, so it'll do the same thing as Rohan answered...just maybe slightly more explicit?

I had the same question, and I tested 3 main solution :

(df != null) && (df.count > 0)
df.head(1).isEmpty() as @hulin003 suggest
df.rdd.isEmpty() as @Justin Pihony suggest

and of course the 3 works, however in term of perfermance, here is what I found, when executing the these methods on the same DF in my machine, in terme of execution time :

it takes ~9366ms
it takes ~5607ms
it takes ~1921ms

therefore I think that the best solution is df.rdd.isEmpty() as @Justin Pihony suggest

Since Spark 2.4.0 there is Dataset.isEmpty.

It's implementation is :

def isEmpty: Boolean = 
  withAction("isEmpty", limit(1).groupBy().count().queryExecution) { plan =>
    plan.executeCollect().head.getLong(0) == 0
}

Note that a DataFrame is no longer a class in Scala, it's just a type alias (probably changed with Spark 2.0):

type DataFrame = Dataset[Row]

Related questions
                            
                                What does "Stage Skipped" mean in Apache Spark web UI?
                            
                                Convert pyspark string to date format
                            
                                Why do Spark jobs fail with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 in speculation mode?
                            
                                Best way to get the max value in a Spark dataframe column
                            
                                java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. spark Eclipse on windows 7
                            
                                Extract column values of Dataframe as List in Apache Spark
                            
                                How to create an empty DataFrame with a specified schema?
                            
                                Can apache spark run without hadoop?
                            
                                Spark Dataframe distinguish columns with duplicated name
                            
                                What do the numbers on the progress bar mean in spark-shell?
                            
                                Spark - Error "A master URL must be set in your configuration" when submitting an app
                            
                                Spark DataFrame groupBy and sort in the descending order (pyspark)
                            
                                How to load local file in sc.textFile, instead of HDFS
                            
                                Load CSV file with Spark
                            
                                How to kill a running Spark application?
                            
                                How to delete columns in pyspark dataframe
                            
                                How to overwrite the output directory in spark
                            
                                importing pyspark in python shell
                            
                                How to change a dataframe column from String type to Double type in PySpark?
                            
                                How to print the contents of RDD?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to check if spark dataframe is empty?

Tags:

apache-spark

apache-spark-sql

People also ask

Recent Activity

Donate For Us