Lets say I have a dataframe like this
A B
0 a b
1 c d
2 e f
3 g h
0,1,2,3 are times, a, c, e, g is one time series and b, d, f, h is another time series. I need to be able to add two columns to the orignal dataframe which is got by computing the differences of consecutive rows for certain columns.
So i need something like this
A B dA
0 a b (a-c)
1 c d (c-e)
2 e f (e-g)
3 g h Nan
I saw something called diff on the dataframe/series but that does it slightly differently as in first element will become Nan.
diff() function. This function calculates the difference between two consecutive DataFrame elements. Parameters: periods: Represents periods to shift for computing difference, Integer type value.
Because of this, we can easily use the shift method to subtract between rows. The Pandas shift method offers a pre-step to calculating the difference between two rows by letting you see the data directly. The Pandas diff method simply calculates the difference, thereby abstracting the calculation.
To sum all the rows of a DataFrame, use the sum() function and set the axis value as 1. The value axis 1 will add the row values.
Use shift.
df['dA'] = df['A'] - df['A'].shift(-1)
You could use diff
and pass -1
as the periods
argument:
>>> df = pd.DataFrame({"A": [9, 4, 2, 1], "B": [12, 7, 5, 4]})
>>> df["dA"] = df["A"].diff(-1)
>>> df
A B dA
0 9 12 5
1 4 7 2
2 2 5 1
3 1 4 NaN
[4 rows x 3 columns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With