Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Polars keep the biggest value using 2 categories

I have a polars dataframe that contain some ID, actions, and values :

Example Dataframe:

data = {
    "ID" : [1, 1, 2,2,3,3],
    "Action" : ["A", "A", "B", "B", "A", "A"],
    "Where" : ["Office", "Home", "Home", "Office", "Home", "Home"],
    "Value" : [1, 2, 3, 4, 5, 6]
}

df = pl.DataFrame(data)

I want to select for each ID and action the biggest value, so i know where he rather do the action.

I'm taking the following approach :

(
    df
    .select(
        pl.col("ID"),
        pl.col("Action"),
        pl.col("Where"),
        TOP = pl.col("Value").max().over(["ID", "Action"]))
)

After that , i sorted the values and keep the unique values (The first one) to maintain the desired info, however the input its incorrect :

(
    df
    .select(
        pl.col("ID"),
        pl.col("Action"),
        pl.col("Where"),
        TOP = pl.col("Value").max().over(["ID", "Action"]))
    .sort(
        pl.col("*"), descending =True
    )
    .unique(
        subset = ["ID", "Action"],
        maintain_order = True,
        keep = "first"
    )
)

Current Output :

shape: (3, 4)
┌─────┬────────┬────────┬─────┐
│ ID  ┆ Action ┆ Where  ┆ TOP │
│ --- ┆ ---    ┆ ---    ┆ --- │
│ i64 ┆ str    ┆ str    ┆ i64 │
╞═════╪════════╪════════╪═════╡
│ 3   ┆ A      ┆ Home   ┆ 6   │
│ 2   ┆ B      ┆ Office ┆ 4   │
│ 1   ┆ A      ┆ Office ┆ 2   │
└─────┴────────┴────────┴─────┘

Expected Output:

shape: (3, 4)
┌─────┬────────┬────────┬─────┐
│ ID  ┆ Action ┆ Where  ┆ TOP │
│ --- ┆ ---    ┆ ---    ┆ --- │
│ i64 ┆ str    ┆ str    ┆ i64 │
╞═════╪════════╪════════╪═════╡
│ 3   ┆ A      ┆ Home   ┆ 6   │
│ 2   ┆ B      ┆ Office ┆ 4   │
│ 1   ┆ A      ┆ Home   ┆ 2   │
└─────┴────────┴────────┴─────┘

Also, i think this approach its not the optimal way

like image 346
Simon Avatar asked Aug 31 '25 23:08

Simon


2 Answers

The over and unique could be combined into a group_by

  • .arg_max() can give you the index of the max
  • .get() will extract the corresponding values at that index
(df.group_by("ID", "Action")
   .agg(
      pl.all().get(pl.col("Value").arg_max())
   )
)
shape: (3, 4)
┌─────┬────────┬────────┬───────┐
│ ID  ┆ Action ┆ Where  ┆ Value │
│ --- ┆ ---    ┆ ---    ┆ ---   │
│ i64 ┆ str    ┆ str    ┆ i64   │
╞═════╪════════╪════════╪═══════╡
│ 1   ┆ A      ┆ Home   ┆ 2     │
│ 2   ┆ B      ┆ Office ┆ 4     │
│ 3   ┆ A      ┆ Home   ┆ 6     │
└─────┴────────┴────────┴───────┘
like image 102
jqurious Avatar answered Sep 03 '25 14:09

jqurious


You can use the following builtin in function in polars to do this easily.

with_columns: this provides you with with the information based on col(value) over both ID and Action taken.

sort: this just sorts in descending order

unique : this allows you to drop all duplicates

import polars as pl

data = {
    "ID": [1, 1, 2, 2, 3, 3],
    "Action": ["A", "A", "B", "B", "A", "A"],
    "Where": ["Office", "Home", "Home", "Office", "Home", "Home"],
    "Value": [1, 2, 3, 4, 5, 6]
}

df = pl.DataFrame(data)

result = (
    df
    .with_columns(
        TOP=pl.col("Value").max().over(["ID", "Action"])
    )
    .sort(by="TOP", descending=True)
    .unique(subset=["ID", "Action"], maintain_order=True, keep="first")
)

print(result)
shape: (3, 5)
┌─────┬────────┬────────┬───────┬─────┐
│ ID  ┆ Action ┆ Where  ┆ Value ┆ TOP │
│ --- ┆ ---    ┆ ---    ┆ ---   ┆ --- │
│ i64 ┆ str    ┆ str    ┆ i64   ┆ i64 │
╞═════╪════════╪════════╪═══════╪═════╡
│ 3   ┆ A      ┆ Home   ┆ 5     ┆ 6   │
│ 2   ┆ B      ┆ Home   ┆ 3     ┆ 4   │
│ 1   ┆ A      ┆ Office ┆ 1     ┆ 2   │
└─────┴────────┴────────┴───────┴─────┘

Alternatively you can go with groupby and filter which eliminates the use of unique even tho they sound the same

result = (
    df.group_by(["ID", "Action"]).agg(
        [
            pl.col("Value").max().alias("TOP"),
            pl.col("Where")
            .filter(pl.col("Value") == pl.col("Value").max())
            .first()
            .alias("Where"),
        ]
    )
    .sort("TOP", descending=True)
)

another result:

shape: (3, 4)
┌─────┬────────┬─────┬────────┐
│ ID  ┆ Action ┆ TOP ┆ Where  │
│ --- ┆ ---    ┆ --- ┆ ---    │
│ i64 ┆ str    ┆ i64 ┆ str    │
╞═════╪════════╪═════╪════════╡
│ 3   ┆ A      ┆ 6   ┆ Home   │
│ 2   ┆ B      ┆ 4   ┆ Office │
│ 1   ┆ A      ┆ 2   ┆ Home   │
└─────┴────────┴─────┴────────┘

like image 24
temi Avatar answered Sep 03 '25 14:09

temi