Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Polars equivalent to Pandas min_count on groupby

I'm trying to find the equivalent of a min_count param on polars groupby, such as in pandas.groupby(key).sum(min_count=N).

Let's suppose the dataframe

df = pl.from_repr("""
┌───────┬───────┐
│ fruit ┆ price │
│ ---   ┆ ---   │
│ str   ┆ i64   │
╞═══════╪═══════╡
│ a     ┆ 1     │
│ a     ┆ 3     │
│ a     ┆ 5     │
│ b     ┆ 10    │
│ b     ┆ 10    │
│ b     ┆ 10    │
│ b     ┆ 20    │
└───────┴───────┘
""")

How can I groupby through the fruit key with the constrain of the group having at least 4 values for the sum?

So instead of

┌───────┬───────┐
│ fruit ┆ price │
│ ---   ┆ ---   │
│ str   ┆ i64   │
╞═══════╪═══════╡
│ b     ┆ 50    │
│ a     ┆ 9     │
└───────┴───────┘

I'd have only fruit b on the output, since it's the only one with at least 4 elements

┌───────┬───────┐
│ fruit ┆ price │
│ ---   ┆ ---   │
│ str   ┆ i64   │
╞═══════╪═══════╡
│ b     ┆ 50    │
└───────┴───────┘
like image 801
viniciusbaca Avatar asked Oct 11 '25 16:10

viniciusbaca


1 Answers

I don't think there's a built-in min_count for this, but you can just filter:

(
    df.group_by("fruit")
    .agg(pl.col("price").sum(), pl.len())
    .filter(pl.col("len") >= 4)
    .drop("len")
)
shape: (1, 2)
┌───────┬───────┐
│ fruit ┆ price │
│ ---   ┆ ---   │
│ str   ┆ i64   │
╞═══════╪═══════╡
│ b     ┆ 50    │
└───────┴───────┘
like image 70
ignoring_gravity Avatar answered Oct 14 '25 05:10

ignoring_gravity



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!