Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas difference between first and last grouped by consecutive events

Tags:

python

pandas

I have a dataframe containing open/close, candle color and number of consecutive candles.

    date open close color  run
00:01:00  100   102     g    1
00:02:00  102   104     g    2
00:03:00  104   106     g    3
00:04:00  106   105     r    1
00:05:00  105   101     r    2
00:06:00  101   102     g    1  
00:06:00  102   103     g    2 

I'm trying to calculate the absolute value of the difference between the open of the first candle in the run and the close of the last candle in the run and apply the difference to each line. The result would look like

    date open close color  run  run_length
00:01:00  100   102     g    1      2        # abs(100 - 102)
00:02:00  102   104     g    2      4        # abs(100 - 104)
00:03:00  104   106     g    3      6        # abs(100 - 106)
00:04:00  106   105     r    1      1        # abs(106 - 105)
00:05:00  105   101     r    2      5        # abs(106 - 101)
00:06:00  101   102     g    1      1        # abs(101 - 102)
00:06:00  102   103     g    2      2        # abs(101 - 103)

I have read two other posts that come close but don't quite get to the solution I'm looking for:

get first and last values in a groupby

Pandas number of consecutive occurrences in previous rows

I'm using df.groupby((df['color'] != df['color'].shift()).cumsum()) to group the rows by the color of the candle (this is how I calculated the color and the run count) and I can get the first and last values of the group using .agg(['first', 'last']).stack() but this doesn't allow me to apply the difference per line of the original dataframe.

like image 803
Michael Sutherland Avatar asked May 03 '18 19:05

Michael Sutherland


1 Answers

Are you looking for a groupby? For more robustness, follow @Wen's suggestion in the comments, perform a groupby using the cumsum trick:

df['run_length'] = df.groupby(
    df['color'].ne(df['color'].shift()).cumsum()
).open.transform('first').sub(df.close).abs()

df    
       date  open  close color  run  run_length
0  00:01:00   100    102     g    1           2
1  00:02:00   102    104     g    2           4
2  00:03:00   104    106     g    3           6
3  00:04:00   106    105     r    1           1
4  00:05:00   105    101     r    2           5
5  00:06:00   101    102     g    1           1
6  00:06:00   102    103     g    2           2
like image 146
cs95 Avatar answered Nov 11 '22 06:11

cs95