For a dataframe, before it is like:
+----+----+----+
| ID|TYPE|CODE|
+----+----+----+
| 1| B| X1|
|null|null|null|
|null| B| X1|
+----+----+----+
After I hope it's like:
+----+----+----+
| ID|TYPE|CODE|
+----+----+----+
| 1| B| X1|
|null| B| X1|
+----+----+----+
I prefer a general method such that it can apply when df.columns
is very long.
Thanks!
Drop all rows having at least one null value DataFrame. dropna() method is your friend. When you call dropna() over the whole DataFrame without specifying any arguments (i.e. using the default behaviour) then the method will drop all rows with at least one missing value.
fillna() function was introduced in Spark version 1.3. 1 and is used to replace null values with another specified value. It accepts two parameters namely value and subset . value corresponds to the desired value you want to replace nulls with.
We can use where or filter function to 'remove' or 'delete' rows from a DataFrame.
Providing strategy for na.drop
is all you need:
df = spark.createDataFrame([
(1, "B", "X1"), (None, None, None), (None, "B", "X1"), (None, "C", None)],
("ID", "TYPE", "CODE")
)
df.na.drop(how="all").show()
+----+----+----+
| ID|TYPE|CODE|
+----+----+----+
| 1| B| X1|
|null| B| X1|
|null| C|null|
+----+----+----+
Alternative formulation can be achieved with threshold
(number of NOT NULL
values):
df.na.drop(thresh=1).show()
+----+----+----+
| ID|TYPE|CODE|
+----+----+----+
| 1| B| X1|
|null| B| X1|
|null| C|null|
+----+----+----+
One option is to use functools.reduce
to construct the conditions:
from functools import reduce
df.filter(~reduce(lambda x, y: x & y, [df[c].isNull() for c in df.columns])).show()
+----+----+----+
| ID|TYPE|CODE|
+----+----+----+
| 1| B| X1|
|null| B| X1|
+----+----+----+
where reduce
produce a query as follows:
~reduce(lambda x, y: x & y, [df[c].isNull() for c in df.columns])
# Column<b'(NOT (((ID IS NULL) AND (TYPE IS NULL)) AND (CODE IS NULL)))'>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With