Get max date column name on polars

Question

I'm trying to get the column name containing the maximum date value in my Polars DataFrame. I found a similar question that was already answered here.

However, in my case, I have many columns, and adding them manually would be tedious. I would like to use column selectors cs.datetime() and have tried the following:

import polars as pl
from datetime import datetime
import polars.selectors as cs

data = {
    "ID" : [1,2,3],
    "Values_A" : [datetime(1,1,2),datetime(1,1,3),datetime(1,1,4)],
    "Values_B" : [datetime(1,1,4),datetime(1,1,7),datetime(1,1,2)]
}

df = pl.DataFrame(data)

def arg_max_horizontal(*columns: pl.Expr) -> pl.Expr:
    return (
        pl.concat_list(columns)
        .list.arg_max()
        .replace_strict({i: col_name for i, col_name in enumerate(columns)})
    )

(
    df
    .with_columns(
        Largest=arg_max_horizontal(pl.select(cs.datetime()))
    )
)

Hericks · Accepted Answer

You were one step away and simply needed to select the column names of interest using df.select(cs.datetime()).columns. Then, we can unpack the list in the function call.

Note. I've adapted the type hint of arg_max_horizontal accordingly. Moreover, (thanks to @Cameron Riddell), we can simplify conversion to a string representation using a cast to a suitable pl.Enum.

def arg_max_horizontal(*columns: str) -> pl.Expr:
    return (
        pl.concat_list(columns)
        .list.arg_max()
        .cast(pl.Enum(columns))
    )

(
    df
    .with_columns(
        Largest=arg_max_horizontal(*df.select(cs.datetime()).columns)
    )
)

shape: (3, 4)
┌─────┬─────────────────────┬─────────────────────┬──────────┐
│ ID  ┆ Values_A            ┆ Values_B            ┆ Largest  │
│ --- ┆ ---                 ┆ ---                 ┆ ---      │
│ i64 ┆ datetime[μs]        ┆ datetime[μs]        ┆ str      │
╞═════╪═════════════════════╪═════════════════════╪══════════╡
│ 1   ┆ 0001-01-02 00:00:00 ┆ 0001-01-04 00:00:00 ┆ Values_B │
│ 2   ┆ 0001-01-03 00:00:00 ┆ 0001-01-07 00:00:00 ┆ Values_B │
│ 3   ┆ 0001-01-04 00:00:00 ┆ 0001-01-02 00:00:00 ┆ Values_A │
└─────┴─────────────────────┴─────────────────────┴──────────┘

Get max date column name on polars

Tags:

python

dataframe

python-polars

Simon

1 Answers

Hericks

Recent Activity

Donate For Us

Get max date column name on polars

Tags:

python

dataframe

python-polars

Simon

1 Answers

Hericks

Related questions

Recent Activity

Donate For Us