I've got a data frame, df
, with three columns: count_a
, count_b
and date
; the counts are floats, and the dates are consecutive days in 2015.
I'm trying to figure out the difference between each day's counts in both the count_a
and count_b
columns — meaning, I'm trying to calculate the difference between each row and the preceding row for both of those columns. I've set the date as the index, but am having trouble figuring out how to do this; there were a couple of hints about using pd.Series
and pd.DataFrame.diff
but I haven't had any luck finding an applicable answer or set of instructions.
I'm a bit stuck, and would appreciate some guidance here.
Here's what my data frame looks like:
df=pd.Dataframe({'count_a': {Timestamp('2015-01-01 00:00:00'): 34175.0, Timestamp('2015-01-02 00:00:00'): 72640.0, Timestamp('2015-01-03 00:00:00'): 109354.0, Timestamp('2015-01-04 00:00:00'): 144491.0, Timestamp('2015-01-05 00:00:00'): 180355.0, Timestamp('2015-01-06 00:00:00'): 214615.0, Timestamp('2015-01-07 00:00:00'): 250096.0, Timestamp('2015-01-08 00:00:00'): 287880.0, Timestamp('2015-01-09 00:00:00'): 332528.0, Timestamp('2015-01-10 00:00:00'): 381460.0, Timestamp('2015-01-11 00:00:00'): 422981.0, Timestamp('2015-01-12 00:00:00'): 463539.0, Timestamp('2015-01-13 00:00:00'): 505395.0, Timestamp('2015-01-14 00:00:00'): 549027.0, Timestamp('2015-01-15 00:00:00'): 595377.0, Timestamp('2015-01-16 00:00:00'): 649043.0, Timestamp('2015-01-17 00:00:00'): 707727.0, Timestamp('2015-01-18 00:00:00'): 761287.0, Timestamp('2015-01-19 00:00:00'): 814372.0, Timestamp('2015-01-20 00:00:00'): 867096.0, Timestamp('2015-01-21 00:00:00'): 920838.0, Timestamp('2015-01-22 00:00:00'): 983405.0, Timestamp('2015-01-23 00:00:00'): 1067243.0, Timestamp('2015-01-24 00:00:00'): 1164421.0, Timestamp('2015-01-25 00:00:00'): 1252178.0, Timestamp('2015-01-26 00:00:00'): 1341484.0, Timestamp('2015-01-27 00:00:00'): 1427600.0, Timestamp('2015-01-28 00:00:00'): 1511549.0, Timestamp('2015-01-29 00:00:00'): 1594846.0, Timestamp('2015-01-30 00:00:00'): 1694226.0, Timestamp('2015-01-31 00:00:00'): 1806727.0, Timestamp('2015-02-01 00:00:00'): 1899880.0, Timestamp('2015-02-02 00:00:00'): 1987978.0, Timestamp('2015-02-03 00:00:00'): 2080338.0, Timestamp('2015-02-04 00:00:00'): 2175775.0, Timestamp('2015-02-05 00:00:00'): 2279525.0, Timestamp('2015-02-06 00:00:00'): 2403306.0, Timestamp('2015-02-07 00:00:00'): 2545696.0, Timestamp('2015-02-08 00:00:00'): 2672464.0, Timestamp('2015-02-09 00:00:00'): 2794788.0}, 'count_b': {Timestamp('2015-01-01 00:00:00'): nan, Timestamp('2015-01-02 00:00:00'): nan, Timestamp('2015-01-03 00:00:00'): nan, Timestamp('2015-01-04 00:00:00'): nan, Timestamp('2015-01-05 00:00:00'): nan, Timestamp('2015-01-06 00:00:00'): nan, Timestamp('2015-01-07 00:00:00'): nan, Timestamp('2015-01-08 00:00:00'): nan, Timestamp('2015-01-09 00:00:00'): nan, Timestamp('2015-01-10 00:00:00'): nan, Timestamp('2015-01-11 00:00:00'): nan, Timestamp('2015-01-12 00:00:00'): nan, Timestamp('2015-01-13 00:00:00'): nan, Timestamp('2015-01-14 00:00:00'): nan, Timestamp('2015-01-15 00:00:00'): nan, Timestamp('2015-01-16 00:00:00'): nan, Timestamp('2015-01-17 00:00:00'): nan, Timestamp('2015-01-18 00:00:00'): nan, Timestamp('2015-01-19 00:00:00'): nan, Timestamp('2015-01-20 00:00:00'): nan, Timestamp('2015-01-21 00:00:00'): nan, Timestamp('2015-01-22 00:00:00'): nan, Timestamp('2015-01-23 00:00:00'): nan, Timestamp('2015-01-24 00:00:00'): 71.0, Timestamp('2015-01-25 00:00:00'): 150.0, Timestamp('2015-01-26 00:00:00'): 236.0, Timestamp('2015-01-27 00:00:00'): 345.0, Timestamp('2015-01-28 00:00:00'): 1239.0, Timestamp('2015-01-29 00:00:00'): 2228.0, Timestamp('2015-01-30 00:00:00'): 7094.0, Timestamp('2015-01-31 00:00:00'): 16593.0, Timestamp('2015-02-01 00:00:00'): 27190.0, Timestamp('2015-02-02 00:00:00'): 37519.0, Timestamp('2015-02-03 00:00:00'): 49003.0, Timestamp('2015-02-04 00:00:00'): 63323.0, Timestamp('2015-02-05 00:00:00'): 79846.0, Timestamp('2015-02-06 00:00:00'): 101568.0, Timestamp('2015-02-07 00:00:00'): 127120.0, Timestamp('2015-02-08 00:00:00'): 149955.0, Timestamp('2015-02-09 00:00:00'): 171440.0}})
Because of this, we can easily use the shift method to subtract between rows. What is this? The Pandas shift method offers a pre-step to calculating the difference between two rows by letting you see the data directly. The Pandas diff method simply calculates the difference, thereby abstracting the calculation.
Example #1: Use subtract() function to subtract each element of a dataframe with a corresponding element in a series.
You can use the DataFrame. diff() function to find the difference between two rows in a pandas DataFrame. where: periods: The number of previous rows for calculating the difference.
diff
should give the desired result:
>>> df.diff() count_a count_b 2015-01-01 NaN NaN 2015-01-02 38465 NaN 2015-01-03 36714 NaN 2015-01-04 35137 NaN 2015-01-05 35864 NaN .... 2015-02-07 142390 25552 2015-02-08 126768 22835 2015-02-09 122324 21485
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With