Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rolling Indexing in Polars?

I'd like to ask around if anyone knows how to do rolling indexing in polars? I have personally tried a few solutions which did not work for me (I'll show them below):

What I'd like to do: Indexing the number of occurrences within the past X days by Name Example: Let's say I'd like to index occurrences within the past 2 days:

import polars as pl

df = pl.from_repr("""
┌─────────┬─────────────────────┬─────────┐
│ Name    ┆ Date                ┆ Counter │
│ ---     ┆ ---                 ┆ ---     │
│ str     ┆ datetime[ns]        ┆ i64     │
╞═════════╪═════════════════════╪═════════╡
│ John    ┆ 2023-01-01 00:00:00 ┆ 1       │
│ John    ┆ 2023-01-01 00:00:00 ┆ 2       │
│ John    ┆ 2023-01-01 00:00:00 ┆ 3       │
│ John    ┆ 2023-01-01 00:00:00 ┆ 4       │
│ John    ┆ 2023-01-02 00:00:00 ┆ 5       │
│ John    ┆ 2023-01-02 00:00:00 ┆ 6       │
│ John    ┆ 2023-01-02 00:00:00 ┆ 7       │
│ John    ┆ 2023-01-02 00:00:00 ┆ 8       │
│ John    ┆ 2023-01-03 00:00:00 ┆ 5       │
│ John    ┆ 2023-01-03 00:00:00 ┆ 6       │
│ New Guy ┆ 2023-01-01 00:00:00 ┆ 1       │
└─────────┴─────────────────────┴─────────┘
""")

In this case, the counter resets to "1" starting from the past X days (e.g. for 3 Jan 23, it starts "1" from 2 Jan 23), or if a new name is detected

What I've tried:

(df.rolling(index_column='Date', period='2d', group_by='Name') 
   .agg((pl.col("Date").rank(method='ordinal')).alias("Counter"))
)

The above does not work because it outputs:

┌─────────┬─────────────────────┬──────────────────────────┐
│ Name    ┆ Date                ┆ Counter                  │
│ ---     ┆ ---                 ┆ ---                      │
│ str     ┆ datetime[ns]        ┆ list[u32]                │
╞═════════╪═════════════════════╪══════════════════════════╡
│ John    ┆ 2023-01-01 00:00:00 ┆ [1, 2, 3, 4]             │
│ John    ┆ 2023-01-01 00:00:00 ┆ [1, 2, 3, 4]             │
│ John    ┆ 2023-01-01 00:00:00 ┆ [1, 2, 3, 4]             │
│ John    ┆ 2023-01-01 00:00:00 ┆ [1, 2, 3, 4]             │
│ John    ┆ 2023-01-02 00:00:00 ┆ [1, 2, 3, 4, 5, 6, 7, 8] │
│ John    ┆ 2023-01-02 00:00:00 ┆ [1, 2, 3, 4, 5, 6, 7, 8] │
│ John    ┆ 2023-01-02 00:00:00 ┆ [1, 2, 3, 4, 5, 6, 7, 8] │
│ John    ┆ 2023-01-02 00:00:00 ┆ [1, 2, 3, 4, 5, 6, 7, 8] │
│ John    ┆ 2023-01-03 00:00:00 ┆ [1, 2, 3, 4, 5, 6]       │
│ John    ┆ 2023-01-03 00:00:00 ┆ [1, 2, 3, 4, 5, 6]       │
│ New Guy ┆ 2023-01-01 00:00:00 ┆ [1]                      │
└─────────┴─────────────────────┴──────────────────────────┘
(df.with_columns(Mask=1) 
   .with_columns(Counter=pl.col("Mask").rolling_sum_by(window_size='2d', by="Date"))
)

But it outputs:

┌─────────┬─────────────────────┬─────────┬──────┐
│ Name    ┆ Date                ┆ Counter ┆ mask │
│ ---     ┆ ---                 ┆ ---     ┆ ---  │
│ str     ┆ datetime[ns]        ┆ i32     ┆ i32  │
╞═════════╪═════════════════════╪═════════╪══════╡
│ John    ┆ 2023-01-01 00:00:00 ┆ 5       ┆ 1    │
│ John    ┆ 2023-01-01 00:00:00 ┆ 5       ┆ 1    │
│ John    ┆ 2023-01-01 00:00:00 ┆ 5       ┆ 1    │
│ John    ┆ 2023-01-01 00:00:00 ┆ 5       ┆ 1    │
│ John    ┆ 2023-01-02 00:00:00 ┆ 9       ┆ 1    │
│ John    ┆ 2023-01-02 00:00:00 ┆ 9       ┆ 1    │
│ John    ┆ 2023-01-02 00:00:00 ┆ 9       ┆ 1    │
│ John    ┆ 2023-01-02 00:00:00 ┆ 9       ┆ 1    │
│ John    ┆ 2023-01-03 00:00:00 ┆ 6       ┆ 1    │
│ John    ┆ 2023-01-03 00:00:00 ┆ 6       ┆ 1    │
│ New Guy ┆ 2023-01-01 00:00:00 ┆ 5       ┆ 1    │
└─────────┴─────────────────────┴─────────┴──────┘

And it also cannot handle "New Guy" correctly because rolling_sum cannot do group_by=["Name", "Date"]

df.with_columns(Counter = pl.col("Date").rank(method='ordinal').over("Name", "Date") )

The above code works correctly, but can only be used for indexing within the same day (i.e. period="1d")


Additional Notes

I also did this in Excel, and also using a brute/raw method of using a "for"-loop. Both worked perfectly, however they struggled with huge amounts of data.

What I read: Some references to help in answers: (Most didn't work because they have fixed rolling window instead of a dynamic window by "Date")

  • https://stackoverflow.com/questions/77633868/how-to-implement-rolling-rank-in-polars-version-0-19

  • https://github.com/pola-rs/polars/issues/4808

  • How to do rolling() grouped by day by hour in Polars?

  • How to group_by and rolling in polars?

  • https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.rank.html

  • https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.rolling.html#polars.DataFrame.rolling

like image 327
user24758287 Avatar asked Oct 27 '25 12:10

user24758287


1 Answers

You could start with the approach giving the maximum count for each group (using pl.len() within the aggregation) and post-process the Counter column to make it's values increase within each group.

(
    df
    .rolling(index_column="Date", period="2d", group_by="Name")
    .agg(
        pl.len().alias("Counter")
    )
    .with_columns(
        (pl.col("Counter") - pl.len() + 1 + pl.int_range(pl.len())).over("Name", "Date")
    )
)
shape: (11, 3)
┌─────────┬────────────┬─────────┐
│ Name    ┆ Date       ┆ Counter │
│ ---     ┆ ---        ┆ ---     │
│ str     ┆ date       ┆ i64     │
╞═════════╪════════════╪═════════╡
│ John    ┆ 2023-01-01 ┆ 1       │
│ John    ┆ 2023-01-01 ┆ 2       │
│ John    ┆ 2023-01-01 ┆ 3       │
│ John    ┆ 2023-01-01 ┆ 4       │
│ John    ┆ 2023-01-02 ┆ 5       │
│ John    ┆ 2023-01-02 ┆ 6       │
│ John    ┆ 2023-01-02 ┆ 7       │
│ John    ┆ 2023-01-02 ┆ 8       │
│ John    ┆ 2023-01-03 ┆ 5       │
│ John    ┆ 2023-01-03 ┆ 6       │
│ New Guy ┆ 2023-01-01 ┆ 1       │
└─────────┴────────────┴─────────┘

Explanation. After the aggregation, the Counter column will take a constant value V within each name-date-group of length L. The objective is to make Counter take the values V-L+1 to V (one value for each row) instead.

Therefore, we can

  1. subtract L-1 from Counter and
  2. add an int range with increasing values from 0 to L-1.
like image 113
Hericks Avatar answered Oct 29 '25 02:10

Hericks



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!