I'm trying to get the column name containing the maximum date value in my Polars DataFrame. I found a similar question that was already answered here.
However, in my case, I have many columns, and adding them manually would be tedious. I would like to use column selectors cs.datetime()
and have tried the following:
import polars as pl
from datetime import datetime
import polars.selectors as cs
data = {
"ID" : [1,2,3],
"Values_A" : [datetime(1,1,2),datetime(1,1,3),datetime(1,1,4)],
"Values_B" : [datetime(1,1,4),datetime(1,1,7),datetime(1,1,2)]
}
df = pl.DataFrame(data)
def arg_max_horizontal(*columns: pl.Expr) -> pl.Expr:
return (
pl.concat_list(columns)
.list.arg_max()
.replace_strict({i: col_name for i, col_name in enumerate(columns)})
)
(
df
.with_columns(
Largest=arg_max_horizontal(pl.select(cs.datetime()))
)
)
You were one step away and simply needed to select the column names of interest using df.select(cs.datetime()).columns
. Then, we can unpack the list in the function call.
Note. I've adapted the type hint of arg_max_horizontal
accordingly. Moreover, (thanks to @Cameron Riddell), we can simplify conversion to a string representation using a cast to a suitable pl.Enum
.
def arg_max_horizontal(*columns: str) -> pl.Expr:
return (
pl.concat_list(columns)
.list.arg_max()
.cast(pl.Enum(columns))
)
(
df
.with_columns(
Largest=arg_max_horizontal(*df.select(cs.datetime()).columns)
)
)
shape: (3, 4)
┌─────┬─────────────────────┬─────────────────────┬──────────┐
│ ID ┆ Values_A ┆ Values_B ┆ Largest │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ datetime[μs] ┆ datetime[μs] ┆ str │
╞═════╪═════════════════════╪═════════════════════╪══════════╡
│ 1 ┆ 0001-01-02 00:00:00 ┆ 0001-01-04 00:00:00 ┆ Values_B │
│ 2 ┆ 0001-01-03 00:00:00 ┆ 0001-01-07 00:00:00 ┆ Values_B │
│ 3 ┆ 0001-01-04 00:00:00 ┆ 0001-01-02 00:00:00 ┆ Values_A │
└─────┴─────────────────────┴─────────────────────┴──────────┘
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With