pandas apply with inputs from multiple rows

Question

I need to do an apply on a dataframe using inputs from multiple rows. As a simple example, I can do the following if all the inputs are from a single row:

df['c'] = df[['a','b']].apply(lambda x: awesome stuff, axis=1) 
# or 
df['d'] = df[['b','c']].shift(1).apply(...) # to get the values from the previous row

However, if I need 'a' from the current row, and 'b' from the previous row, is there a way to do that with apply? I could add a new 'bshift' column and then just use df[['a','bshift']] but it seems there must be a more direct way.

Related but separate, when accessing a specific value in the df, is there a way to combine labeled indexing with integer-offset? E.g. I know the label of the current row but need the row before. Something like df.at['labelIknow'-1, 'a'] (which of course doesn't work). This is for when I'm forced to iterate through rows. Thanks in advance.

Edit: Some info on what I'm doing etc. I have a pandas store containing tables of OHLC bars (one table per security). When doing backtesting, currently I pull the full date range I need for a security into memory, and then resample it into a frequency that makes sense for the test at hand. Then I do some vectorized operations for things like trade entry signals etc. Finally I loop over the data from start to finish doing the actual backtest, e.g. checking for trade entry exit, drawdown etc - this looping part is the part I'm trying to speed up.

JohnE · Accepted Answer

This should directly answer your question and let you use apply, although I'm not sure it's ultimately any better than a two-line solution. It does avoid creating extra variables at least.

df['c'] = pd.concat([ df['a'], df['a'].shift() ], axis=1).apply(np.mean,axis=1)

That will put the mean of 'a' values from the current and previous rows into 'c', for example.

This isn't as general, but for simpler cases you can do something like this (continuing the mean example):

df['c'] = ( df['a'] + df['a'].shift() ) / 2

That is about 10x faster than the concat() method on my tiny example dataset. I imagine that's as fast as you could do it, if you can code it in that style.

You could also look into reshaping the data with stack() and hierarchical indexing. That would be a way to get all your variables into the same row but I think it will likely be more complicated than the concat method or just creating intermediate variables via shift().

pandas apply with inputs from multiple rows

Tags:

python

pandas

fantabolous

1 Answers

JohnE

Recent Activity

Donate For Us

pandas apply with inputs from multiple rows

Tags:

python

pandas

fantabolous

1 Answers

JohnE

Related questions

Recent Activity

Donate For Us