I want to perform an if-else transformation for elements in a List series. I use pl.when inside list.eval but encounter a warning message.
I have an DataFrame containing a List series, in which the lengths of each row are different:
In [2]: df = pl.DataFrame({"Tokens": [["a", "b", "c"], ["a"], ["unknown"]]})
In [3]: df
Out[3]:
shape: (3, 1)
┌─────────────────┐
│ Tokens │
│ --- │
│ list[str] │
╞═════════════════╡
│ ["a", "b", "c"] │
│ ["a"] │
│ ["unknown"] │
└─────────────────┘
Now I want to perform a if-else transformation on each elements in the List series. More specifically: lambda token: -1 if token == 'unknown' else hash(token)
I try to use pl.when inside list.eval expression. It works but raises such warning: The predicate '[(col("")) == (Utf8(__ANY__))]' in 'when->then->otherwise' is not a valid aggregation and might produce a different number of rows than the group_by operation would. This behavior is experimental and may be subject to change
In [12]: df.with_columns(pl.col("Tokens").list.eval(pl.when(pl.element() == 'unknown').then(pl.lit(0, dtype=pl.UInt64)).otherwise(pl.element().hash())))
The predicate '[(col("")) == (Utf8(unknown))]' in 'when->then->otherwise' is not a valid aggregation and might produce a different number of rows than the groupby operation would. This behavior is experimental and may be subject to change
Out[12]:
shape: (3, 1)
┌───────────────────────────────────┐
│ Tokens │
│ --- │
│ list[u64] │
╞═══════════════════════════════════╡
│ [1588745937650624681, 1558575890… │
│ [1588745937650624681] │
│ [0] │
└───────────────────────────────────┘
What is the proper way to do this?
pl.when
, then
, otherwise
syntax.(See also: Column assignment based on predicate.)
import polars as pl
print(f"Polars version: {pl.__version__}\n") # NOTE: See polars docs.*
df = pl.DataFrame({"Tokens": [["a", "b", "c"], ["a"], ["unknown"]]})
print(df)
transform_expr = (
pl.when(pl.element() == "unknown")
.then(pl.lit(-1))
.otherwise(pl.element().hash())
)
df = df.with_columns(
pl.col("Tokens").list.eval(transform_expr).alias("Tokens")
)
print(df)
gives:
Polars version: 0.20.13
shape: (3, 1)
┌─────────────────┐
│ Tokens │
│ --- │
│ list[str] │
╞═════════════════╡
│ ["a", "b", "c"] │
│ ["a"] │
│ ["unknown"] │
└─────────────────┘
shape: (3, 1)
┌───────────────────────────────────┐
│ Tokens │
│ --- │
│ list[f64] │
╞═══════════════════════════════════╡
│ [8.1448e17, 6.3145e15, 1.8296e19… │
│ [8.1448e17] │
│ [-1.0] │
└───────────────────────────────────┘
*polars.Expr.hash
: Hash values returned not guaranteed stable except within same version of polars.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With