I am trying to obtain all rows in a dataframe where two flags are set to '1' and subsequently all those that where only one of two is set to '1' and the other NOT EQUAL to '1'
With the following schema (three columns),
df = sqlContext.createDataFrame([('a',1,'null'),('b',1,1),('c',1,'null'),('d','null',1),('e',1,1)], #,('f',1,'NaN'),('g','bla',1)],
schema=('id', 'foo', 'bar')
)
I obtain the following dataframe:
+---+----+----+
| id| foo| bar|
+---+----+----+
| a| 1|null|
| b| 1| 1|
| c| 1|null|
| d|null| 1|
| e| 1| 1|
+---+----+----+
When I apply the desired filters, the first filter (foo=1 AND bar=1) works, but not the other (foo=1 AND NOT bar=1)
foobar_df = df.filter( (df.foo==1) & (df.bar==1) )
yields:
+---+---+---+
| id|foo|bar|
+---+---+---+
| b| 1| 1|
| e| 1| 1|
+---+---+---+
Here is the non-behaving filter:
foo_df = df.filter( (df.foo==1) & (df.bar!=1) )
foo_df.show()
+---+---+---+
| id|foo|bar|
+---+---+---+
+---+---+---+
Why is it not filtering? How can I get the columns where only foo is equal to '1'?
Why is it not filtering
Because it is SQL and NULL
indicates missing values. Because of that any comparison to NULL
, other than IS NULL
and IS NOT NULL
is undefined. You need either:
col("bar").isNull() | (col("bar") != 1)
or
coalesce(col("bar") != 1, lit(True))
or (PySpark >= 2.3):
col("bar").eqNullSafe(1)
if you want null safe comparisons in PySpark.
Also 'null'
is not a valid way to introduce NULL
literal. You should use None
to indicate missing objects.
from pyspark.sql.functions import col, coalesce, lit
df = spark.createDataFrame([
('a', 1, 1), ('a',1, None), ('b', 1, 1),
('c' ,1, None), ('d', None, 1),('e', 1, 1)
]).toDF('id', 'foo', 'bar')
df.where((col("foo") == 1) & (col("bar").isNull() | (col("bar") != 1))).show()
## +---+---+----+
## | id|foo| bar|
## +---+---+----+
## | a| 1|null|
## | c| 1|null|
## +---+---+----+
df.where((col("foo") == 1) & coalesce(col("bar") != 1, lit(True))).show()
## +---+---+----+
## | id|foo| bar|
## +---+---+----+
## | a| 1|null|
## | c| 1|null|
## +---+---+----+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With