Given a dataframe, how do I count the number of [multi-column] unique records within a group, and return the result as a single column?
I'd like to do something like this:
from datetime import date
import polars as pl
from polars import col
day1 = date(1982, 1, 14)
day2 = date(1984, 8, 30)
day3 = date(1992, 1, 8)
day4 = date(1999, 12, 31)
df = pl.DataFrame({
"ID": ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
"gender": ['M', 'M', 'F', 'F', 'M', 'M', 'M', 'M'],
"birth_dt": [day1, day1, day2, day2, day3, day3, day4, day4]
})
df.select(col("gender", "birth_dt").unique().count().over("ID")) # not quite what I want
shape: (8, 2)
┌────────┬──────────┐
│ gender ┆ birth_dt │
│ --- ┆ --- │
│ u32 ┆ u32 │
╞════════╪══════════╡
│ 2 ┆ 2 │
│ 2 ┆ 2 │
│ 2 ┆ 2 │
│ 2 ┆ 2 │
│ 1 ┆ 2 │
│ 1 ┆ 2 │
│ 1 ┆ 2 │
│ 1 ┆ 2 │
└────────┴──────────┘
df.group_by("ID").agg(col("gender", "birth_dt").n_unique()) # also not quite what I want
shape: (2, 3)
┌─────┬────────┬──────────┐
│ ID ┆ gender ┆ birth_dt │
│ --- ┆ --- ┆ --- │
│ str ┆ u32 ┆ u32 │
╞═════╪════════╪══════════╡
│ B ┆ 1 ┆ 2 │
│ A ┆ 2 ┆ 2 │
└─────┴────────┴──────────┘
However, the unique() is called for each column individually. I'd like instead to return the number of unique (gender, birth_dt) combinations for each ID.
One easy way is to bind the columns of interest into a struct, and then count the unique items. For example:
(
df
.group_by("ID")
.agg(
pl.struct("gender", "birth_dt").n_unique().alias("result")
)
)
shape: (2, 2)
┌─────┬────────┐
│ ID ┆ result │
│ --- ┆ --- │
│ str ┆ u32 │
╞═════╪════════╡
│ A ┆ 2 │
│ B ┆ 2 │
└─────┴────────┘
Used with .over()
df.with_columns(
pl.struct("gender", "birth_dt").n_unique().over("ID").alias("result")
)
shape: (8, 4)
┌─────┬────────┬────────────┬────────┐
│ ID ┆ gender ┆ birth_dt ┆ result │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ date ┆ u32 │
╞═════╪════════╪════════════╪════════╡
│ A ┆ M ┆ 1982-01-14 ┆ 2 │
│ A ┆ M ┆ 1982-01-14 ┆ 2 │
│ A ┆ F ┆ 1984-08-30 ┆ 2 │
│ A ┆ F ┆ 1984-08-30 ┆ 2 │
│ B ┆ M ┆ 1992-01-08 ┆ 2 │
│ B ┆ M ┆ 1992-01-08 ┆ 2 │
│ B ┆ M ┆ 1999-12-31 ┆ 2 │
│ B ┆ M ┆ 1999-12-31 ┆ 2 │
└─────┴────────┴────────────┴────────┘
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With