Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rank within group in Polars?

I have a Polars dataframe like so:

import polars as pl

df = pl.from_repr("""
┌─────┬─────┬─────┐
│ c1  ┆ c2  ┆ c3  │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═════╪═════╪═════╡
│ a   ┆ a   ┆ 1   │
│ a   ┆ a   ┆ 1   │
│ a   ┆ b   ┆ 1   │
│ a   ┆ c   ┆ 1   │
│ d   ┆ a   ┆ 1   │
│ d   ┆ b   ┆ 1   │
└─────┴─────┴─────┘
""")

I am trying to assign a number to each group of (c2, c3) within c1, so that would look like this:

┌─────┬─────┬─────┬──────┐
│ c1  ┆ c2  ┆ c3  ┆ rank │
│ --- ┆ --- ┆ --- ┆ ---  │
│ str ┆ str ┆ i64 ┆ u32  │
╞═════╪═════╪═════╪══════╡
│ a   ┆ a   ┆ 1   ┆ 0    │
│ a   ┆ a   ┆ 1   ┆ 0    │
│ a   ┆ b   ┆ 1   ┆ 1    │
│ a   ┆ c   ┆ 1   ┆ 2    │
│ d   ┆ a   ┆ 1   ┆ 0    │
│ d   ┆ b   ┆ 1   ┆ 1    │
└─────┴─────┴─────┴──────┘

How do I accomplish this?

I see how to do a global ranking:

df.join(
    df.select("c1", "c2", "c3")
    .unique()
    .with_columns(rank=pl.int_range(1, pl.len() + 1)),
    on=["c1", "c2", "c3"]
)
shape: (6, 4)
┌─────┬─────┬─────┬──────┐
│ c1  ┆ c2  ┆ c3  ┆ rank │
│ --- ┆ --- ┆ --- ┆ ---  │
│ str ┆ str ┆ i64 ┆ i64  │
╞═════╪═════╪═════╪══════╡
│ a   ┆ a   ┆ 1   ┆ 1    │
│ a   ┆ a   ┆ 1   ┆ 1    │
│ a   ┆ b   ┆ 1   ┆ 2    │
│ a   ┆ c   ┆ 1   ┆ 4    │
│ d   ┆ a   ┆ 1   ┆ 3    │
│ d   ┆ b   ┆ 1   ┆ 5    │
└─────┴─────┴─────┴──────┘

but that is a global ranking, not one within the c1 group. I also wonder if it possible to do this with over() instead of the groupby/join pattern.

like image 212
ldrg Avatar asked Oct 18 '25 14:10

ldrg


1 Answers

Create a struct of columns c2, c3 using pl.struct("c2", "c3"), compute the dense rank over c1, and then subtract 1 because the ranks start from 1 by default:

pl.struct("c2", "c3").rank("dense").over("c1") - 1

Full code:

import polars as pl

df = pl.DataFrame(
    {
        "c1": ["a", "a", "a", "a", "d", "d"],
        "c2": ["a", "a", "b", "c", "a", "b"],
        "c3": [1, 1, 1, 1, 1, 1],
    }
)

df2 = df.with_columns(rank=pl.struct("c2", "c3").rank("dense").over("c1") - 1)

print(df2)

Output:

┌─────┬─────┬─────┬──────┐
│ c1  ┆ c2  ┆ c3  ┆ rank │
│ --- ┆ --- ┆ --- ┆ ---  │
│ str ┆ str ┆ i64 ┆ u32  │
╞═════╪═════╪═════╪══════╡
│ a   ┆ a   ┆ 1   ┆ 0    │
│ a   ┆ a   ┆ 1   ┆ 0    │
│ a   ┆ b   ┆ 1   ┆ 1    │
│ a   ┆ c   ┆ 1   ┆ 2    │
│ d   ┆ a   ┆ 1   ┆ 0    │
│ d   ┆ b   ┆ 1   ┆ 1    │
└─────┴─────┴─────┴──────┘
like image 168
Dogbert Avatar answered Oct 21 '25 02:10

Dogbert