I expected either a or b would be 0.0 (not NaN) and c would always be 0.0. The Polars documentation said to use | as "or" and & as "and". I believe I have the logic right: (((a not Nan) or (b not Nan)) and (c not NaN))
However, the output is wrong.
import polars as pl
import numpy as np
df = pl.DataFrame(
data={
"a": [0.0, 0.0, 0.0, 0.0, np.nan, np.nan, np.nan],
"b": [0.0, 0.0, np.nan, np.nan, 0.0, 0.0, np.nan],
"c": [0.0, np.nan, 0.0, np.nan, 0.0, np.nan, np.nan]
}
)
df.with_columns(
((pl.col('a').is_not_nan() | pl.col('b').is_not_nan())
& pl.col('c').is_not_nan()).alias('Keep'))
df_actual = df.filter(pl.col("Keep") is True)
print("df\n", df)
print("df_expect\n", df_expect)
print("df_actual\n", df_actual)
df
shape: (7, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╡
│ 0.0 ┆ 0.0 ┆ 0.0 │
│ 0.0 ┆ 0.0 ┆ NaN │
│ 0.0 ┆ NaN ┆ 0.0 │
│ 0.0 ┆ NaN ┆ NaN │
│ NaN ┆ 0.0 ┆ 0.0 │
│ NaN ┆ 0.0 ┆ NaN │
│ NaN ┆ NaN ┆ NaN │
└─────┴─────┴─────┘
df_expect
shape: (3, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╡
│ 0.0 ┆ NaN ┆ 0.0 │
│ NaN ┆ 0.0 ┆ 0.0 │
│ 0.0 ┆ 0.0 ┆ 0.0 │
└─────┴─────┴─────┘
df_actual
shape: (0, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╡
└─────┴─────┴─────┘
The logic looks fine.
One issue is that Polars operations are not "in-place". (apart from some niche methods)
.with_columns() returns a new frame - which you are not using.
Another issue is the usage of is with Expr objects.
>>> type(pl.col("Keep"))
polars.expr.expr.Expr
>>> pl.col("Keep") is True
False
You end up running .filter(False) - hence the result of 0 rows.
If you add the column:
df_actual = df.with_columns(
((pl.col("a").is_not_nan() | pl.col("b").is_not_nan())
& pl.col("c").is_not_nan()).alias("Keep")
)
You can just pass the name (or pl.col) directly.
df_actual = df_actual.filter("Keep")
You could also chain the calls e.g. df.with_columns().filter()
Or you can filter the predicates directly.
df_actual = df.filter(
(pl.col("a").is_not_nan() | pl.col("b").is_not_nan())
& pl.col("a").is_not_nan()
)
shape: (3, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╡
│ 0.0 ┆ 0.0 ┆ 0.0 │
│ 0.0 ┆ NaN ┆ 0.0 │
│ NaN ┆ 0.0 ┆ 0.0 │
└─────┴─────┴─────┘
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With