I have a dataframe containing open/close, candle color and number of consecutive candles.
date open close color run
00:01:00 100 102 g 1
00:02:00 102 104 g 2
00:03:00 104 106 g 3
00:04:00 106 105 r 1
00:05:00 105 101 r 2
00:06:00 101 102 g 1
00:06:00 102 103 g 2
I'm trying to calculate the absolute value of the difference between the open of the first candle in the run and the close of the last candle in the run and apply the difference to each line. The result would look like
date open close color run run_length
00:01:00 100 102 g 1 2 # abs(100 - 102)
00:02:00 102 104 g 2 4 # abs(100 - 104)
00:03:00 104 106 g 3 6 # abs(100 - 106)
00:04:00 106 105 r 1 1 # abs(106 - 105)
00:05:00 105 101 r 2 5 # abs(106 - 101)
00:06:00 101 102 g 1 1 # abs(101 - 102)
00:06:00 102 103 g 2 2 # abs(101 - 103)
I have read two other posts that come close but don't quite get to the solution I'm looking for:
get first and last values in a groupby
Pandas number of consecutive occurrences in previous rows
I'm using df.groupby((df['color'] != df['color'].shift()).cumsum())
to group the rows by the color of the candle (this is how I calculated the color and the run count) and I can get the first and last values of the group using .agg(['first', 'last']).stack()
but this doesn't allow me to apply the difference per line of the original dataframe.
Are you looking for a groupby
? For more robustness, follow @Wen's suggestion in the comments, perform a groupby
using the cumsum
trick:
df['run_length'] = df.groupby(
df['color'].ne(df['color'].shift()).cumsum()
).open.transform('first').sub(df.close).abs()
df
date open close color run run_length
0 00:01:00 100 102 g 1 2
1 00:02:00 102 104 g 2 4
2 00:03:00 104 106 g 3 6
3 00:04:00 106 105 r 1 1
4 00:05:00 105 101 r 2 5
5 00:06:00 101 102 g 1 1
6 00:06:00 102 103 g 2 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With