Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating difference between two rows in Python / Pandas

Tags:

python

pandas

In python, how can I reference previous row and calculate something against it? Specifically, I am working with dataframes in pandas - I have a data frame full of stock price information that looks like this:

           Date   Close  Adj Close 251  2011-01-03  147.48     143.25 250  2011-01-04  147.64     143.41 249  2011-01-05  147.05     142.83 248  2011-01-06  148.66     144.40 247  2011-01-07  147.93     143.69 

Here is how I created this dataframe:

import pandas  url = 'http://ichart.finance.yahoo.com/table.csv?s=IBM&a=00&b=1&c=2011&d=11&e=31&f=2011&g=d&ignore=.csv' data = data = pandas.read_csv(url)  ## now I sorted the data frame ascending by date  data = data.sort(columns='Date') 

Starting with row number 2, or in this case, I guess it's 250 (PS - is that the index?), I want to calculate the difference between 2011-01-03 and 2011-01-04, for every entry in this dataframe. I believe the appropriate way is to write a function that takes the current row, then figures out the previous row, and calculates the difference between them, the use the pandas apply function to update the dataframe with the value.

Is that the right approach? If so, should I be using the index to determine the difference? (note - I'm still in python beginner mode, so index may not be the right term, nor even the correct way to implement this)

like image 708
mikebmassey Avatar asked Oct 29 '12 00:10

mikebmassey


People also ask

How do you find the difference between consecutive rows in pandas?

diff() function. This function calculates the difference between two consecutive DataFrame elements. Parameters: periods: Represents periods to shift for computing difference, Integer type value.

How do you subtract two rows in Python?

Because of this, we can easily use the shift method to subtract between rows. The Pandas shift method offers a pre-step to calculating the difference between two rows by letting you see the data directly. The Pandas diff method simply calculates the difference, thereby abstracting the calculation.

How do you tell the difference between two Series in pandas?

Pandas Series: diff() function The diff() function is used to first discrete difference of element. Calculates the difference of a Series element compared with another element in the Series (default is element in previous row). Periods to shift for calculating difference, accepts negative values.

How do I check if two rows have the same value in pandas?

The equals() function is used to test whether two Pandas objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.


2 Answers

I think you want to do something like this:

In [26]: data Out[26]:             Date   Close  Adj Close 251  2011-01-03  147.48     143.25 250  2011-01-04  147.64     143.41 249  2011-01-05  147.05     142.83 248  2011-01-06  148.66     144.40 247  2011-01-07  147.93     143.69  In [27]: data.set_index('Date').diff() Out[27]:              Close  Adj Close Date                         2011-01-03    NaN        NaN 2011-01-04   0.16       0.16 2011-01-05  -0.59      -0.58 2011-01-06   1.61       1.57 2011-01-07  -0.73      -0.71 
like image 186
Chang She Avatar answered Oct 07 '22 17:10

Chang She


To calculate difference of one column. Here is what you can do.

df=       A      B 0     10     56 1     45     48 2     26     48 3     32     65 

We want to compute row difference in A only and want to consider the rows which are less than 15.

df['A_dif'] = df['A'].diff() df=           A      B      A_dif     0     10     56      Nan     1     45     48      35     2     26     48      19     3     32     65      6 df = df[df['A_dif']<15]  df=           A      B      A_dif     0     10     56      Nan     3     32     65      6 
like image 38
Msquare Avatar answered Oct 07 '22 17:10

Msquare