I have a pandas series of dates and cumulative values like this:
'2015-01-01': 1
'2015-01-02': 3
'2015-01-03': 7
'2015-01-04': 8
Can I use pandas to convert them in "deltas" like this?
'2015-01-01': 0
'2015-01-02': 2
'2015-01-03': 4
'2015-01-04': 1
Or should I just do it manually?
You can just call diff
, the default period is 1
:
In [21]:
df['diff'] = df['Cumulative_Value'].diff()
df
Out[21]:
Dates Cumulative_Value diff
0 2015-01-01 1 NaN
1 2015-01-02 3 2
2 2015-01-03 7 4
3 2015-01-04 8 1
You can also use shift
which is a more general purpose operation that returns a series shifted by a period (again the default is 1), in this case though on this size dataset it is slower:
In [25]:
%timeit df['Cumulative_Value'].diff()
%timeit df['Cumulative_Value'] - df['Cumulative_Value'].shift()
%timeit df[['Cumulative_Value']].apply(lambda x: x - x.shift(), axis=1)
10000 loops, best of 3: 109 µs per loop
1000 loops, best of 3: 330 µs per loop
100 loops, best of 3: 4.02 ms per loop
The last example shows how much slower looping over each row is (I've used a double [[]]
to force it to return a dataframe so I can pass param axis=1
which applies the lambda function row-wise) compared to the first 2 methods which are vectorised. Always seek a vectorised approach to take advantage of the significantly faster operations.
So you can see here that diff
is ~4000x faster than looping over each row, as it is vectorised it will scale much better than a non-vectorised approach.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With