I have a data frame with 4 columns. I have sorted this data frame by 'group' and 'timestamp' beforehand.
df = pd.DataFrame(
{
"type": ['type0', 'type1', 'type2', 'type3', 'type1', 'type3', 'type0', 'type1', 'type3', 'type3'],
"group": [1, 1, 1, 1, 1, 1, 2, 2, 2, 2],
"timestamp": ["20220105 07:52:46", "20220105 07:53:11", "20220105 07:53:55", "20220105 07:59:12", "20220105 08:24:13", "20220105 08:48:19", "20220105 11:01:30", "20220105 11:15:16", "20220105 12:13:36", "20220105 12:19:44"],
"price": [0, 1.5, 2.5, 3, 3.2, 3.1, 0.5, 3, 3.25, pd.NA]
})
>> df
type group timestamp price
0 type0 1 20220105 07:52:46 0
1 type1 1 20220105 07:53:11 1.5
2 type2 1 20220105 07:53:55 2.5
3 type3 1 20220105 07:59:12 3
4 type1 1 20220105 08:24:13 3.2
5 type3 1 20220105 08:48:19 3.1
6 type0 2 20220105 11:01:30 0.5
7 type1 2 20220105 11:15:16 3
8 type3 2 20220105 12:13:36 3.25
9 type3 2 20220105 12:19:44 <NA>
After grouping by the column 'group', I want to create a 'new_price' column as per the following logic:
For each 'type3' row in a group (i.e., df['type'] = 'type3'), get the price from the PREVIOUS 'type1' or 'type2' row in the group.
For type0/type1/type2 rows, keep the same price as in the input data frame.
My Solution:
My solution below works when we don't have 2 consecutive 'type3' rows. But when there are 2 consecutive 'type3' rows, I get the wrong price for the second 'type3' row. I want the price from the previous 'type1' or 'type2' row in the group, but I get the price from the first 'type3' row using my solution.
df = df.sort_values(by=["group", "timestamp"])
required_types_mask = df['type'].isin(['type1', 'type2', 'type3'])
temp_series = df.loc[:, 'price'].where(required_types_mask).groupby(df['group']).shift(1)
type_3_mask = df['type'].eq('type3')
df.loc[:, 'new_price'] = df.loc[:, 'price'].mask(type_3_mask, temp_series)
My result:
type group timestamp price new_price
0 type0 1 20220105 07:52:46 0 0
1 type1 1 20220105 07:53:11 1.5 1.5
2 type2 1 20220105 07:53:55 2.5 2.5
3 type3 1 20220105 07:59:12 3 2.5
4 type1 1 20220105 08:24:13 3.2 3.2
5 type3 1 20220105 08:48:19 3.1 3.2
6 type0 2 20220105 11:01:30 0.5 0.5
7 type1 2 20220105 11:15:16 3 3
8 type3 2 20220105 12:13:36 3.25 3
9 type3 2 20220105 12:19:44 <NA> 3.25 <- Incorrect price
Expected result:
type group timestamp price new_price
0 type0 1 20220105 07:52:46 0 0
1 type1 1 20220105 07:53:11 1.5 1.5
2 type2 1 20220105 07:53:55 2.5 2.5
3 type3 1 20220105 07:59:12 3 2.5
4 type1 1 20220105 08:24:13 3.2 3.2
5 type3 1 20220105 08:48:19 3.1 3.2
6 type0 2 20220105 11:01:30 0.5 0.5
7 type1 2 20220105 11:15:16 3 3
8 type3 2 20220105 12:13:36 3.25 3
9 type3 2 20220105 12:19:44 <NA> 3 <- Correct price
We can mask the price with type3 then ffill
s = df.price.mask(df.type.isin(['type0','type3']))
df['new'] = np.where(df.type.eq('type3'),s.groupby(df['group']).ffill(),df['price'])
df
type group timestamp price new
0 type0 1 20220105 07:52:46 0 0
1 type1 1 20220105 07:53:11 1.5 1.5
2 type2 1 20220105 07:53:55 2.5 2.5
3 type3 1 20220105 07:59:12 3 2.5
4 type1 1 20220105 08:24:13 3.2 3.2
5 type3 1 20220105 08:48:19 3.1 3.2
6 type0 2 20220105 11:01:30 0.5 0.5
7 type1 2 20220105 11:15:16 3 3
8 type3 2 20220105 12:13:36 3.25 3
9 type3 2 20220105 12:19:44 <NA> 3
You can use a series of masks to ffill.
First mask 'type3' and 'type0' (the latter to avoid using it as source to ffill). Then restore the values of 'type0'.
All is done per group.
df['new_price'] = (
df.groupby('group')
.apply(lambda d: d['price']
.mask(d['type'].isin(['type3', 'type0'])) # type0/3 to NaN
.ffill() # fill with previous type1/2
.mask(d['type'].eq('type0'), d['price']) # restore type0
)
.values
)
output:
type group timestamp price new_price
0 type0 1 20220105 07:52:46 0 0
1 type1 1 20220105 07:53:11 1.5 1.5
2 type2 1 20220105 07:53:55 2.5 2.5
3 type3 1 20220105 07:59:12 3 2.5
4 type1 1 20220105 08:24:13 3.2 3.2
5 type3 1 20220105 08:48:19 3.1 3.2
6 type0 2 20220105 11:01:30 0.5 0.5
7 type1 2 20220105 11:15:16 3 3.0
8 type3 2 20220105 12:13:36 3.25 3.0
9 type3 2 20220105 12:19:44 <NA> 3.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With