In python, how can I reference previous row and calculate something against it? Specifically, I am working with dataframes
in pandas
- I have a data frame full of stock price information that looks like this:
Date Close Adj Close 251 2011-01-03 147.48 143.25 250 2011-01-04 147.64 143.41 249 2011-01-05 147.05 142.83 248 2011-01-06 148.66 144.40 247 2011-01-07 147.93 143.69
Here is how I created this dataframe:
import pandas url = 'http://ichart.finance.yahoo.com/table.csv?s=IBM&a=00&b=1&c=2011&d=11&e=31&f=2011&g=d&ignore=.csv' data = data = pandas.read_csv(url) ## now I sorted the data frame ascending by date data = data.sort(columns='Date')
Starting with row number 2, or in this case, I guess it's 250 (PS - is that the index?), I want to calculate the difference between 2011-01-03 and 2011-01-04, for every entry in this dataframe. I believe the appropriate way is to write a function that takes the current row, then figures out the previous row, and calculates the difference between them, the use the pandas
apply
function to update the dataframe with the value.
Is that the right approach? If so, should I be using the index to determine the difference? (note - I'm still in python beginner mode, so index may not be the right term, nor even the correct way to implement this)
diff() function. This function calculates the difference between two consecutive DataFrame elements. Parameters: periods: Represents periods to shift for computing difference, Integer type value.
Because of this, we can easily use the shift method to subtract between rows. The Pandas shift method offers a pre-step to calculating the difference between two rows by letting you see the data directly. The Pandas diff method simply calculates the difference, thereby abstracting the calculation.
Pandas Series: diff() function The diff() function is used to first discrete difference of element. Calculates the difference of a Series element compared with another element in the Series (default is element in previous row). Periods to shift for calculating difference, accepts negative values.
The equals() function is used to test whether two Pandas objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.
I think you want to do something like this:
In [26]: data Out[26]: Date Close Adj Close 251 2011-01-03 147.48 143.25 250 2011-01-04 147.64 143.41 249 2011-01-05 147.05 142.83 248 2011-01-06 148.66 144.40 247 2011-01-07 147.93 143.69 In [27]: data.set_index('Date').diff() Out[27]: Close Adj Close Date 2011-01-03 NaN NaN 2011-01-04 0.16 0.16 2011-01-05 -0.59 -0.58 2011-01-06 1.61 1.57 2011-01-07 -0.73 -0.71
To calculate difference of one column. Here is what you can do.
df= A B 0 10 56 1 45 48 2 26 48 3 32 65
We want to compute row difference in A only and want to consider the rows which are less than 15.
df['A_dif'] = df['A'].diff() df= A B A_dif 0 10 56 Nan 1 45 48 35 2 26 48 19 3 32 65 6 df = df[df['A_dif']<15] df= A B A_dif 0 10 56 Nan 3 32 65 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With