Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding a column thats result of difference in consecutive rows in pandas

Lets say I have a dataframe like this

    A   B
0   a   b
1   c   d
2   e   f 
3   g   h

0,1,2,3 are times, a, c, e, g is one time series and b, d, f, h is another time series. I need to be able to add two columns to the orignal dataframe which is got by computing the differences of consecutive rows for certain columns.

So i need something like this

    A   B   dA
0   a   b  (a-c)
1   c   d  (c-e)
2   e   f  (e-g)
3   g   h   Nan

I saw something called diff on the dataframe/series but that does it slightly differently as in first element will become Nan.

like image 202
AMM Avatar asked Apr 17 '14 20:04

AMM


People also ask

How do you tell the difference between consecutive rows in pandas?

diff() function. This function calculates the difference between two consecutive DataFrame elements. Parameters: periods: Represents periods to shift for computing difference, Integer type value.

How do you subtract consecutive rows in pandas?

Because of this, we can easily use the shift method to subtract between rows. The Pandas shift method offers a pre-step to calculating the difference between two rows by letting you see the data directly. The Pandas diff method simply calculates the difference, thereby abstracting the calculation.

How do I sum across rows in pandas?

To sum all the rows of a DataFrame, use the sum() function and set the axis value as 1. The value axis 1 will add the row values.


2 Answers

Use shift.

df['dA'] = df['A'] - df['A'].shift(-1)
like image 150
exp1orer Avatar answered Oct 12 '22 12:10

exp1orer


You could use diff and pass -1 as the periods argument:

>>> df = pd.DataFrame({"A": [9, 4, 2, 1], "B": [12, 7, 5, 4]})
>>> df["dA"] = df["A"].diff(-1)
>>> df
   A   B  dA
0  9  12   5
1  4   7   2
2  2   5   1
3  1   4 NaN

[4 rows x 3 columns]
like image 48
DSM Avatar answered Oct 12 '22 11:10

DSM