Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas dataframe add previous row values

I have a pandas dataframe that looks like this:

                     AAPL   IBM  GOOG  XOM
2011-01-10 16:00:00  1500     0     0    0
2011-01-11 16:00:00     0     0     0    0
2011-01-12 16:00:00     0     0     0    0
2011-01-13 16:00:00 -1500  4000     0    0
2011-01-14 16:00:00     0     0     0    0
2011-01-18 16:00:00     0     0     0    0

My goal is to fill the rows by adding the previous row values. The result would look like this:

                     AAPL   IBM  GOOG  XOM
2011-01-10 16:00:00  1500     0     0    0
2011-01-11 16:00:00  1500     0     0    0
2011-01-12 16:00:00  1500     0     0    0
2011-01-13 16:00:00     0  4000     0    0
2011-01-14 16:00:00     0  4000     0    0
2011-01-18 16:00:00     0  4000     0    0

I tried to iterate through the dataframe index with

    for date in df.index:

and to increment dates with

    dt_nextDate = date + dt.timedelta(days=1)

but there are gaps in the dataframe index that stand for weekends.

Can I iterate through the index from the second row to the end, refer back to the previous row and add the values?

like image 875
Georges Cunty Avatar asked Sep 29 '13 09:09

Georges Cunty


1 Answers

Your example result is not the output of your example algorithm, so I'm not sure what are you exactly asking for?

The desired result you showed is a cumulative sum, which you can get using:

>>> df.cumsum()
                    AAPL   IBM  GOOG  XOM
index                                    
2011-01-1016:00:00  1500     0     0    0
2011-01-1116:00:00  1500     0     0    0
2011-01-1216:00:00  1500     0     0    0
2011-01-1316:00:00     0  4000     0    0
2011-01-1416:00:00     0  4000     0    0
2011-01-1816:00:00     0  4000     0    0

But the thing you explained you want and the algorithm you showed, are more likely to be a rolling sum with a window size equals to 2:

>>> result = pd.rolling_sum(df, 2)
>>> result
                    AAPL   IBM  GOOG  XOM
index                                    
2011-01-1016:00:00   NaN   NaN   NaN  NaN
2011-01-1116:00:00  1500     0     0    0
2011-01-1216:00:00     0     0     0    0
2011-01-1316:00:00 -1500  4000     0    0
2011-01-1416:00:00 -1500  4000     0    0
2011-01-1816:00:00     0     0     0    0

To fix the NaNs just do:

>>> result.iloc[0,:] = df.iloc[0,:]
>>> result
                    AAPL   IBM  GOOG  XOM
index                                    
2011-01-1016:00:00  1500     0     0    0
2011-01-1116:00:00  1500     0     0    0
2011-01-1216:00:00     0     0     0    0
2011-01-1316:00:00 -1500  4000     0    0
2011-01-1416:00:00 -1500  4000     0    0
2011-01-1816:00:00     0     0     0    0
like image 185
Viktor Kerkez Avatar answered Sep 24 '22 23:09

Viktor Kerkez