I'm dealing with different Spark DataFrames
, which have lot of Null values in many columns. I want to get any one non-null value from each of the column to see if that value can be converted to datetime.
I tried doing df.na.drop().first()
in a hope that it'll drop all rows with any null value, and of the remaining DataFrame
, I'll just get the first row with all non-null values. But many of the DataFrames
have so many columns with lot of null values, that df.na.drop()
returns empty DataFrame
.
I also tried finding if any columns has all null
values, so that I could simply drop that columns before trying the above approach, but that still not solved the problem. Any idea how can I accomplish this in efficient way, as this code will be run many times on huge DataFrames
?
Solution: In order to find non-null values of PySpark DataFrame columns, we need to use negate of isNotNull() function for example ~df. name. isNotNull() similarly for non-nan values ~isnan(df.name) .
To do this we will use the first() and head() functions. Syntax: dataframe. first()['column name']
In PySpark DataFrame you can calculate the count of Null, None, NaN or Empty/Blank values in a column by using isNull() of Column class & SQL functions isnan() count() and when().
Collect() is the function, operation for RDD or Dataframe that is used to retrieve the data from the Dataframe. It is used useful in retrieving all the elements of the row from each partition in an RDD and brings that over the driver node/program.
You can use first
function with ingorenulls
. Let's say data looks like this:
from pyspark.sql.types import StringType, StructType, StructField
schema = StructType([
StructField("x{}".format(i), StringType(), True) for i in range(3)
])
df = spark.createDataFrame(
[(None, "foo", "bar"), ("foo", None, "bar"), ("foo", "bar", None)],
schema
)
You can:
from pyspark.sql.functions import first
df.select([first(x, ignorenulls=True).alias(x) for x in df.columns]).first()
Row(x0='foo', x1='foo', x2='bar')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With