I want to create a new column that contains the max, min and average of the next X elements.
After searching for a while the only solution I was able to find was:
df = df.with_columns(
pl.col("price").shift(-20).rolling_max(window_size=20).alias('max'),
pl.col("price").shift(-20).rolling_min(window_size=20).alias('min')
)
However for the first 20 rows there is no data and results in nulls because the .rolling_max function applies the windows of the previous rows and not the next.
Is there a better way to achieve this?
@RomanPekar's solution works well for the creation of few columns. However, when working with multiple columns it could be preferable to use a full forward-looking rolling context using pl.DataFrame.group_by_dynamic as follows.
Data.
df = pl.DataFrame({
"price": [1, 2, 6, 1, 3, 5, 8, 7],
})
Usage of pl.DataFrame.group_by_dynamic.
I've added multiple aggregations of the rolling window and a separate window column to clarify the approach.
(
df
.group_by_dynamic(
index_column=pl.int_range(0, pl.len()),
every="1i", period="2i", offset="0i",
)
.agg(
pl.col("price").first(),
pl.col("price").alias("window"),
pl.col("price").max().alias("min"),
pl.col("price").max().alias("max"),
)
)
Output.
shape: (8, 5)
┌─────────┬───────┬───────────┬─────┬─────┐
│ literal ┆ price ┆ window ┆ min ┆ max │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ list[i64] ┆ i64 ┆ i64 │
╞═════════╪═══════╪═══════════╪═════╪═════╡
│ 0 ┆ 1 ┆ [1, 2] ┆ 2 ┆ 2 │
│ 1 ┆ 2 ┆ [2, 6] ┆ 6 ┆ 6 │
│ 2 ┆ 6 ┆ [6, 1] ┆ 6 ┆ 6 │
│ 3 ┆ 1 ┆ [1, 3] ┆ 3 ┆ 3 │
│ 4 ┆ 3 ┆ [3, 5] ┆ 5 ┆ 5 │
│ 5 ┆ 5 ┆ [5, 8] ┆ 8 ┆ 8 │
│ 6 ┆ 8 ┆ [8, 7] ┆ 8 ┆ 8 │
│ 7 ┆ 7 ┆ [7] ┆ 7 ┆ 7 │
└─────────┴───────┴───────────┴─────┴─────┘
This approach also allows a separate by column to perform these aggregations within groups defined by a separate column.
I think you can use reverse() to apply rolling window to next rows insted of previous ones:
I've made a bit more simple example so you can see the results easier:
df = pl.DataFrame({
"price": [1, 2, 6, 1, 3, 5, 8, 7],
})
┌───────┐
│ price │
│ --- │
│ i64 │
╞═══════╡
│ 1 │
│ 2 │
│ 6 │
│ 1 │
│ 3 │
│ 5 │
│ 8 │
│ 7 │
└───────┘
df.with_columns(
pl.col("price")
.reverse()
.rolling_max(window_size=2, min_periods=1)
.reverse()
.alias('max')
)
┌───────┬─────┐
│ price ┆ max │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═══════╪═════╡
│ 1 ┆ 2 │ # max of [1,2]
│ 2 ┆ 6 │ # max of [2,6]
│ 6 ┆ 6 │ # max of [6,1]
│ 1 ┆ 3 │ # max of [1,3]
│ 3 ┆ 5 │ # max of [3,5]
│ 5 ┆ 8 │ # max of [5,8]
│ 8 ┆ 8 │ # max of [8,7]
│ 7 ┆ 7 │ # max of [7]
└───────┴─────┘
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With