Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python, Pandas. Converting from cumulative values to deltas

I have a pandas series of dates and cumulative values like this:

'2015-01-01': 1
'2015-01-02': 3
'2015-01-03': 7
'2015-01-04': 8

Can I use pandas to convert them in "deltas" like this?

'2015-01-01': 0   
'2015-01-02': 2
'2015-01-03': 4
'2015-01-04': 1

Or should I just do it manually?

like image 991
xpanta Avatar asked Jan 24 '15 20:01

xpanta


1 Answers

You can just call diff, the default period is 1:

In [21]:

df['diff'] = df['Cumulative_Value'].diff()
df
Out[21]:
       Dates  Cumulative_Value  diff
0 2015-01-01                 1   NaN
1 2015-01-02                 3     2
2 2015-01-03                 7     4
3 2015-01-04                 8     1

You can also use shift which is a more general purpose operation that returns a series shifted by a period (again the default is 1), in this case though on this size dataset it is slower:

In [25]:

%timeit df['Cumulative_Value'].diff()
%timeit df['Cumulative_Value'] - df['Cumulative_Value'].shift()
%timeit df[['Cumulative_Value']].apply(lambda x: x - x.shift(), axis=1)
10000 loops, best of 3: 109 µs per loop
1000 loops, best of 3: 330 µs per loop
100 loops, best of 3: 4.02 ms per loop

The last example shows how much slower looping over each row is (I've used a double [[]] to force it to return a dataframe so I can pass param axis=1 which applies the lambda function row-wise) compared to the first 2 methods which are vectorised. Always seek a vectorised approach to take advantage of the significantly faster operations.

So you can see here that diff is ~4000x faster than looping over each row, as it is vectorised it will scale much better than a non-vectorised approach.

like image 108
EdChum Avatar answered Sep 19 '22 13:09

EdChum