Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Number of unique items in a group [duplicate]

Given a dataframe, how do I count the number of [multi-column] unique records within a group, and return the result as a single column?

I'd like to do something like this:

from datetime import date

import polars as pl
from polars import col

day1 = date(1982, 1, 14)
day2 = date(1984, 8, 30)
day3 = date(1992, 1, 8)
day4 = date(1999, 12, 31)
df = pl.DataFrame({
    "ID": ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
    "gender": ['M', 'M', 'F', 'F', 'M', 'M', 'M', 'M'],
    "birth_dt": [day1, day1, day2, day2, day3, day3, day4, day4]
})
df.select(col("gender", "birth_dt").unique().count().over("ID")) # not quite what I want
shape: (8, 2)
┌────────┬──────────┐
│ gender ┆ birth_dt │
│ ---    ┆ ---      │
│ u32    ┆ u32      │
╞════════╪══════════╡
│ 2      ┆ 2        │
│ 2      ┆ 2        │
│ 2      ┆ 2        │
│ 2      ┆ 2        │
│ 1      ┆ 2        │
│ 1      ┆ 2        │
│ 1      ┆ 2        │
│ 1      ┆ 2        │
└────────┴──────────┘
df.group_by("ID").agg(col("gender", "birth_dt").n_unique()) # also not quite what I want
shape: (2, 3)
┌─────┬────────┬──────────┐
│ ID  ┆ gender ┆ birth_dt │
│ --- ┆ ---    ┆ ---      │
│ str ┆ u32    ┆ u32      │
╞═════╪════════╪══════════╡
│ B   ┆ 1      ┆ 2        │
│ A   ┆ 2      ┆ 2        │
└─────┴────────┴──────────┘

However, the unique() is called for each column individually. I'd like instead to return the number of unique (gender, birth_dt) combinations for each ID.

like image 390
NedDasty Avatar asked Dec 05 '25 14:12

NedDasty


1 Answers

One easy way is to bind the columns of interest into a struct, and then count the unique items. For example:

(
    df
    .group_by("ID")
    .agg(
        pl.struct("gender", "birth_dt").n_unique().alias("result")
    )
)
shape: (2, 2)
┌─────┬────────┐
│ ID  ┆ result │
│ --- ┆ ---    │
│ str ┆ u32    │
╞═════╪════════╡
│ A   ┆ 2      │
│ B   ┆ 2      │
└─────┴────────┘

Used with .over()

df.with_columns(
    pl.struct("gender", "birth_dt").n_unique().over("ID").alias("result")
)
shape: (8, 4)
┌─────┬────────┬────────────┬────────┐
│ ID  ┆ gender ┆ birth_dt   ┆ result │
│ --- ┆ ---    ┆ ---        ┆ ---    │
│ str ┆ str    ┆ date       ┆ u32    │
╞═════╪════════╪════════════╪════════╡
│ A   ┆ M      ┆ 1982-01-14 ┆ 2      │
│ A   ┆ M      ┆ 1982-01-14 ┆ 2      │
│ A   ┆ F      ┆ 1984-08-30 ┆ 2      │
│ A   ┆ F      ┆ 1984-08-30 ┆ 2      │
│ B   ┆ M      ┆ 1992-01-08 ┆ 2      │
│ B   ┆ M      ┆ 1992-01-08 ┆ 2      │
│ B   ┆ M      ┆ 1999-12-31 ┆ 2      │
│ B   ┆ M      ┆ 1999-12-31 ┆ 2      │
└─────┴────────┴────────────┴────────┘

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!