Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rolling linear regression slope over n rows by group

My data are at the ID-month level with a payment at every month. It is sorted by id and dt. What I'd like to do is a create a new column that, for each group, holds the linear slope for the next N months of payments. Here is a sample:

data = {"id": ['a','a','a','a','a','a'], "dt": ['2024-01-01', '2024-02-01', '2024-03-01', '2024-04-01', '2024-05-01', '2024-06-01',], "pmt": [3341,3205,3287,3544,6536,5994]}
df = pl.DataFrame(data, schema={"id": pl.String, "dt": pl.Date, 'pmt':pl.Int64}).with_columns(pl.col("dt").set_sorted())
print(df)

Which looks like the following:

shape: (6, 3)
┌─────┬────────────┬──────┐
│ id  ┆ dt         ┆ pmt  │
│ --- ┆ ---        ┆ ---  │
│ str ┆ date       ┆ i64  │
╞═════╪════════════╪══════╡
│ a   ┆ 2024-01-01 ┆ 3341 │
│ a   ┆ 2024-02-01 ┆ 3205 │
│ a   ┆ 2024-03-01 ┆ 3287 │
│ a   ┆ 2024-04-01 ┆ 3544 │
│ a   ┆ 2024-05-01 ┆ 6536 │
│ a   ┆ 2024-06-01 ┆ 5994 │
└─────┴────────────┴──────┘

I know that the linear regression slope for this ID for this block of 6 months is 671.86, so that value would go on the first row. The next row would have the slope for the next rolling 6 months, etc. So result would look like (obviously rows 2-6 have fake slope data; just for visual):

shape: (6, 4)
┌─────┬────────────┬──────┬───────────────────┐
│ id  ┆ dt         ┆ pmt  ┆ slope_pmt_next6mo │
│ --- ┆ ---        ┆ ---  ┆ ---               │
│ str ┆ date       ┆ i64  ┆ f32               │
╞═════╪════════════╪══════╪═══════════════════╡
│ a   ┆ 2024-01-01 ┆ 3341 ┆ 671.859985        │
│ a   ┆ 2024-02-01 ┆ 3205 ┆ 700.25            │
│ a   ┆ 2024-03-01 ┆ 3287 ┆ 646.210022        │
│ a   ┆ 2024-04-01 ┆ 3544 ┆ 683.880005        │
│ a   ┆ 2024-05-01 ┆ 6536 ┆ 547.179993        │
│ a   ┆ 2024-06-01 ┆ 5994 ┆ 525.48999         │
└─────┴────────────┴──────┴───────────────────┘

I assume I'd use some flavor of polars.DataFrame.rolling but I don't quite follow the syntax. I tried the following but the result creates NaN on the first two rows so not sure what's going on:

def ols_slope(y: pl.Expr) -> pl.Expr:
    # Calculate linear regression slope
    x = y.rank("ordinal")
    numerator = ((x - x.mean())*(y - y.mean())).sum()
    denominator = ((x - x.mean())**2).sum()
    return numerator / denominator

(
    df
    .rolling(index_column=("dt"), period="6mo", closed='none', check_sorted=False)
    .agg(ols_slope(pl.col("pmt")).alias("pmt_slope"))
)
like image 611
kstats9pt3 Avatar asked Nov 02 '25 13:11

kstats9pt3


1 Answers

I'm a little confused by your usage of x = y.rank("ordinal") so I'm doing something else.

If we assume dt always starts on the first of the month and you want the time variable to be number of months you can do

def ols_slope(x: str| pl.Expr, y: str|pl.Expr) -> pl.Expr:
    # Convert string input to pl.col
    if isinstance(x,str):
        x=pl.col(x)
    if isinstance(y,str):
        y=pl.col(y)
    # Convert date to integer
    x = pl.arg_where(x==pl.date_range(x.min(), x.max(), '1mo'))+1
    # Calculate linear regression slope
    numerator = ((x - x.mean())*(y - y.mean())).sum()
    denominator = ((x - x.mean())**2).sum()
    return numerator / denominator

For the next part where you're getting NaN at the front instead of the end, that's because rolling takes -period as the default for offset but you want -1mo so altogether you'd do

(
    df
    .rolling('dt', period='6mo',group_by='id',offset='-1mo')
    .agg(slope=ols_slope('dt','pmt'))
)

Incidentally, check out the polars_ds plugin for more tools around data science including regressions.

For example, with it, you could do

# Get a numeric value for month
month_number = (
    pl.arg_where(
        pl.col('dt')==pl.date_range(pl.col('dt').min(), pl.col('dt').max(), '1mo')
        )+1
)
(
    df
    .rolling('dt', period='6mo',group_by='id',offset='-1mo')
    .agg(
        slope=pds.query_lstsq_report(
            month_number , 
            target='pmt', 
            add_bias=True
            ).first().struct.field('coeff')
        )
    )
like image 111
Dean MacGregor Avatar answered Nov 04 '25 08:11

Dean MacGregor



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!