Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate differences between consecutive rows in pandas data frame?

Tags:

I've got a data frame, df, with three columns: count_a, count_b and date; the counts are floats, and the dates are consecutive days in 2015.

I'm trying to figure out the difference between each day's counts in both the count_a and count_b columns — meaning, I'm trying to calculate the difference between each row and the preceding row for both of those columns. I've set the date as the index, but am having trouble figuring out how to do this; there were a couple of hints about using pd.Series and pd.DataFrame.diff but I haven't had any luck finding an applicable answer or set of instructions.

I'm a bit stuck, and would appreciate some guidance here.

Here's what my data frame looks like:

df=pd.Dataframe({'count_a': {Timestamp('2015-01-01 00:00:00'): 34175.0,   Timestamp('2015-01-02 00:00:00'): 72640.0,   Timestamp('2015-01-03 00:00:00'): 109354.0,   Timestamp('2015-01-04 00:00:00'): 144491.0,   Timestamp('2015-01-05 00:00:00'): 180355.0,   Timestamp('2015-01-06 00:00:00'): 214615.0,   Timestamp('2015-01-07 00:00:00'): 250096.0,   Timestamp('2015-01-08 00:00:00'): 287880.0,   Timestamp('2015-01-09 00:00:00'): 332528.0,   Timestamp('2015-01-10 00:00:00'): 381460.0,   Timestamp('2015-01-11 00:00:00'): 422981.0,   Timestamp('2015-01-12 00:00:00'): 463539.0,   Timestamp('2015-01-13 00:00:00'): 505395.0,   Timestamp('2015-01-14 00:00:00'): 549027.0,   Timestamp('2015-01-15 00:00:00'): 595377.0,   Timestamp('2015-01-16 00:00:00'): 649043.0,   Timestamp('2015-01-17 00:00:00'): 707727.0,   Timestamp('2015-01-18 00:00:00'): 761287.0,   Timestamp('2015-01-19 00:00:00'): 814372.0,   Timestamp('2015-01-20 00:00:00'): 867096.0,   Timestamp('2015-01-21 00:00:00'): 920838.0,   Timestamp('2015-01-22 00:00:00'): 983405.0,   Timestamp('2015-01-23 00:00:00'): 1067243.0,   Timestamp('2015-01-24 00:00:00'): 1164421.0,   Timestamp('2015-01-25 00:00:00'): 1252178.0,   Timestamp('2015-01-26 00:00:00'): 1341484.0,   Timestamp('2015-01-27 00:00:00'): 1427600.0,   Timestamp('2015-01-28 00:00:00'): 1511549.0,   Timestamp('2015-01-29 00:00:00'): 1594846.0,   Timestamp('2015-01-30 00:00:00'): 1694226.0,   Timestamp('2015-01-31 00:00:00'): 1806727.0,   Timestamp('2015-02-01 00:00:00'): 1899880.0,   Timestamp('2015-02-02 00:00:00'): 1987978.0,   Timestamp('2015-02-03 00:00:00'): 2080338.0,   Timestamp('2015-02-04 00:00:00'): 2175775.0,   Timestamp('2015-02-05 00:00:00'): 2279525.0,   Timestamp('2015-02-06 00:00:00'): 2403306.0,   Timestamp('2015-02-07 00:00:00'): 2545696.0,   Timestamp('2015-02-08 00:00:00'): 2672464.0,   Timestamp('2015-02-09 00:00:00'): 2794788.0},  'count_b': {Timestamp('2015-01-01 00:00:00'): nan,   Timestamp('2015-01-02 00:00:00'): nan,   Timestamp('2015-01-03 00:00:00'): nan,   Timestamp('2015-01-04 00:00:00'): nan,   Timestamp('2015-01-05 00:00:00'): nan,   Timestamp('2015-01-06 00:00:00'): nan,   Timestamp('2015-01-07 00:00:00'): nan,   Timestamp('2015-01-08 00:00:00'): nan,   Timestamp('2015-01-09 00:00:00'): nan,   Timestamp('2015-01-10 00:00:00'): nan,   Timestamp('2015-01-11 00:00:00'): nan,   Timestamp('2015-01-12 00:00:00'): nan,   Timestamp('2015-01-13 00:00:00'): nan,   Timestamp('2015-01-14 00:00:00'): nan,   Timestamp('2015-01-15 00:00:00'): nan,   Timestamp('2015-01-16 00:00:00'): nan,   Timestamp('2015-01-17 00:00:00'): nan,   Timestamp('2015-01-18 00:00:00'): nan,   Timestamp('2015-01-19 00:00:00'): nan,   Timestamp('2015-01-20 00:00:00'): nan,   Timestamp('2015-01-21 00:00:00'): nan,   Timestamp('2015-01-22 00:00:00'): nan,   Timestamp('2015-01-23 00:00:00'): nan,   Timestamp('2015-01-24 00:00:00'): 71.0,   Timestamp('2015-01-25 00:00:00'): 150.0,   Timestamp('2015-01-26 00:00:00'): 236.0,   Timestamp('2015-01-27 00:00:00'): 345.0,   Timestamp('2015-01-28 00:00:00'): 1239.0,   Timestamp('2015-01-29 00:00:00'): 2228.0,   Timestamp('2015-01-30 00:00:00'): 7094.0,   Timestamp('2015-01-31 00:00:00'): 16593.0,   Timestamp('2015-02-01 00:00:00'): 27190.0,   Timestamp('2015-02-02 00:00:00'): 37519.0,   Timestamp('2015-02-03 00:00:00'): 49003.0,   Timestamp('2015-02-04 00:00:00'): 63323.0,   Timestamp('2015-02-05 00:00:00'): 79846.0,   Timestamp('2015-02-06 00:00:00'): 101568.0,   Timestamp('2015-02-07 00:00:00'): 127120.0,   Timestamp('2015-02-08 00:00:00'): 149955.0,   Timestamp('2015-02-09 00:00:00'): 171440.0}}) 
like image 966
scrollex Avatar asked Jan 18 '16 01:01

scrollex


People also ask

How do you subtract consecutive rows in pandas?

Because of this, we can easily use the shift method to subtract between rows. What is this? The Pandas shift method offers a pre-step to calculating the difference between two rows by letting you see the data directly. The Pandas diff method simply calculates the difference, thereby abstracting the calculation.

How do you subtract two rows in a data frame?

Example #1: Use subtract() function to subtract each element of a dataframe with a corresponding element in a series.

How do I compare row values in pandas?

You can use the DataFrame. diff() function to find the difference between two rows in a pandas DataFrame. where: periods: The number of previous rows for calculating the difference.


1 Answers

diff should give the desired result:

>>> df.diff() count_a  count_b 2015-01-01      NaN      NaN 2015-01-02    38465      NaN 2015-01-03    36714      NaN 2015-01-04    35137      NaN 2015-01-05    35864      NaN .... 2015-02-07   142390    25552 2015-02-08   126768    22835 2015-02-09   122324    21485 
like image 130
Mike Müller Avatar answered Sep 24 '22 11:09

Mike Müller