Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to perform a max_by window aggregation in Polars?

I am trying to use polars to do a window aggregate over one value, but map it back to another.

For example, if i wanted to get the name of the max value in a group, instead of (or in combination to) just the max value.

assuming an input of something like this.

df = pl.from_repr("""
┌───────┬──────┬───────┐
│ label ┆ name ┆ value │
│ ---   ┆ ---  ┆ ---   │
│ str   ┆ str  ┆ f64   │
╞═══════╪══════╪═══════╡
│ a.    ┆ foo  ┆ 1.0   │
│ a.    ┆ bar  ┆ 2.0   │
│ b.    ┆ baz  ┆ 1.5   │
│ b.    ┆ boo  ┆ -1.0  │
└───────┴──────┴───────┘
""")
# 'max_by' is not a real method, just using it to express what i'm trying to achieve. 
df.select(pl.col('label'), pl.col('name').max_by('value').over('label'))

i want an output like this

shape: (2, 2)
┌───────┬──────┐
│ label ┆ name │
│ ---   ┆ ---  │
│ str   ┆ str  │
╞═══════╪══════╡
│ a.    ┆ bar  │
│ b.    ┆ baz  │
└───────┴──────┘

ideally with the value too. But i know i can easily add that in via pl.col('value').max().over('label').

shape: (2, 3)
┌───────┬──────┬───────┐
│ label ┆ name ┆ value │
│ ---   ┆ ---  ┆ ---   │
│ str   ┆ str  ┆ f64   │
╞═══════╪══════╪═══════╡
│ a.    ┆ bar  ┆ 2.0   │
│ b.    ┆ baz  ┆ 1.5   │
└───────┴──────┴───────┘
like image 619
Cory Grinstead Avatar asked Nov 07 '25 12:11

Cory Grinstead


2 Answers

You were close. There is a sort_by expression that can be used.

df.group_by('label').agg(pl.all().sort_by('value').last())
shape: (2, 3)
┌───────┬──────┬───────┐
│ label ┆ name ┆ value │
│ ---   ┆ ---  ┆ ---   │
│ str   ┆ str  ┆ f64   │
╞═══════╪══════╪═══════╡
│ a.    ┆ bar  ┆ 2.0   │
│ b.    ┆ baz  ┆ 1.5   │
└───────┴──────┴───────┘

If you need a windowed version of this:

df.with_columns(
    pl.col('name','value').sort_by('value').last().over('label').name.suffix("_max")
)
shape: (4, 5)
┌───────┬──────┬───────┬──────────┬───────────┐
│ label ┆ name ┆ value ┆ name_max ┆ value_max │
│ ---   ┆ ---  ┆ ---   ┆ ---      ┆ ---       │
│ str   ┆ str  ┆ f64   ┆ str      ┆ f64       │
╞═══════╪══════╪═══════╪══════════╪═══════════╡
│ a.    ┆ foo  ┆ 1.0   ┆ bar      ┆ 2.0       │
│ a.    ┆ bar  ┆ 2.0   ┆ bar      ┆ 2.0       │
│ b.    ┆ baz  ┆ 1.5   ┆ baz      ┆ 1.5       │
│ b.    ┆ boo  ┆ -1.0  ┆ baz      ┆ 1.5       │
└───────┴──────┴───────┴──────────┴───────────┘

You can do it using DataFrame.filter

import polars as pl

df = pl.DataFrame({
    'label': ['a.', 'a.', 'b.', 'b.'],
    'name': ['foo', 'bar', 'baz', 'boo'],
    'value': [1, 2, 1.5, -1]
})

res = df.filter(
    pl.col('value') == pl.max('value').over('label')
)

Output:

>>> res

shape: (2, 3)
┌───────┬──────┬───────┐
│ label ┆ name ┆ value │
│ ---   ┆ ---  ┆ ---   │
│ str   ┆ str  ┆ f64   │
╞═══════╪══════╪═══════╡
│ a.    ┆ bar  ┆ 2.0   │
│ b.    ┆ baz  ┆ 1.5   │
└───────┴──────┴───────┘
like image 33
Rodalm Avatar answered Nov 09 '25 07:11

Rodalm