Consider the following pl.DataFrame:
import polars as pl
df = pl.DataFrame(
{
"symbol": ["s1", "s1", "s2", "s2"],
"signal": [0, 1, 2, 0],
"trade": [None, 1, None, -1],
}
)
shape: (4, 3)
┌────────┬────────┬───────┐
│ symbol ┆ signal ┆ trade │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞════════╪════════╪═══════╡
│ s1 ┆ 0 ┆ null │
│ s1 ┆ 1 ┆ 1 │
│ s2 ┆ 2 ┆ null │
│ s2 ┆ 0 ┆ -1 │
└────────┴────────┴───────┘
Now, I need to group the dataframe by symbol and check if the first row in every group in column signal is not equal to 0 (zero). It this equals to True, I need to replace the corresponding cell in column trade with the value in the cell in signal.
Here's what I am actually looking for:
shape: (4, 3)
┌────────┬────────┬───────┐
│ symbol ┆ signal ┆ trade │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞════════╪════════╪═══════╡
│ s1 ┆ 0 ┆ null │
│ s1 ┆ 1 ┆ 1 │
│ s2 ┆ 2 ┆ 2 │ <- copy value from the ``signal`` column
│ s2 ┆ 0 ┆ -1 │
└────────┴────────┴───────┘
For this, a when-then-otherwise construct may be used.
True exactly for the first rows (create index on the fly using pl.int_range) in each group with signal not equal to 0.signal or trade column.df.with_columns(
trade=pl.when(
pl.col("signal") != 0,
pl.int_range(pl.len()) == 0,
).then("signal").otherwise("trade").over("symbol")
)
shape: (4, 3)
┌────────┬────────┬───────┐
│ symbol ┆ signal ┆ trade │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞════════╪════════╪═══════╡
│ s1 ┆ 0 ┆ null │
│ s1 ┆ 1 ┆ 1 │
│ s2 ┆ 2 ┆ 2 │
│ s2 ┆ 0 ┆ -1 │
└────────┴────────┴───────┘
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With