Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate percent change compared to the beginning value using pandas?

I have a DataFrame and need to calculate percent change compared to the beginning of the year by companies. Is there any way to use pct_change() or other method to perform this task? Thanks!

df looks like

security    date        price
IBM         1/1/2016    100
IBM         1/2/2016    102
IBM         1/3/2016    108
AAPL        1/1/2016    1000
AAPL        1/2/2016    980
AAPL        1/3/2016    1050
AAPL        1/4/2016    1070

results I want

security    date        price   change
IBM         1/1/2016    100     NA
IBM         1/2/2016    102     2%
IBM         1/3/2016    108     8%
AAPL        1/1/2016    1000    NA
AAPL        1/2/2016    980     -2%
AAPL        1/3/2016    1050    5%
AAPL        1/4/2016    1070    7%
like image 947
E.K. Avatar asked Jan 29 '16 17:01

E.K.


3 Answers

Sounds like you are looking for an expanding_window version of pct_change(). This doesn't exist out of the box AFAIK, but you could roll your own:

df.groupby('security')['price'].apply(lambda x: x.div(x.iloc[0]).subtract(1).mul(100))
like image 96
Stefan Avatar answered Sep 24 '22 17:09

Stefan


This works, assuming you're already ordered by date within each possible grouping.

def pct_change(df):
    df['pct'] = 100 * (1 - df.iloc[0].price / df.price)
    return df

df.groupby('security').apply(pct_change)
like image 44
fivetentaylor Avatar answered Sep 22 '22 17:09

fivetentaylor


Late to the party, but I faced a similar problem and I'd like to share my solution with you, in case it's helpful for anybody.

TL; DR

def get_variation(values: pd.Series) -> np.float64:
    base = values.iloc[0]  # first element in window iteration
    current = values.iloc[-1]  # last element in window iteration

    return (current - base) / base if base else 0  # avoid ZeroDivisionError

variations = df.groupby('security')['price'].expanding(min_periods=2).apply(get_variation)
df = df.assign(change=variations.droplevel(0))
security date price change
0 IBM 1/1/2016 100 NaN
1 IBM 1/2/2016 102 0.02
2 IBM 1/3/2016 108 0.08
3 AAPL 1/1/2016 1000 NaN
4 AAPL 1/2/2016 980 -0.02
5 AAPL 1/3/2016 1050 0.05
6 AAPL 1/4/2016 1070 0.07

You can do what you are asking by:

  1. defining your own variation function,
  2. grouping pd.DataFrame by security column,
  3. using an expanding window operation on the resulting price column,
  4. calling apply on the expanding window operation and passing your custom variation function defined in step 1 as argument,
  5. dropping outer index of resulting series,
  6. assigning previous result to original pd.DataFrame.

Optionally, you can replace the expanding window operation in step 3 with a rolling window operation by calling .rolling(window=2, min_periods=2), to get a step-by-step variation on each security price. window=2 defines the size of the window in order to return two elements on each iteration and min_periods=2 sets the minimum data needed for calculations (will result in NaN, if else).

Step by step

1. Defining your own variation function

Your variation function should accept a pd.Series as argument and use the first and last items in the collection to calculate the variation. Here, I'm using a standard variation function used in finance to calculate interest rates. The last line has an if/else statement to avoid ZeroDivisionError.

def get_variation(values: pd.Series) -> np.float64:
    base = values.iloc[0]
    current = values.iloc[-1]

    return (current - base) / base if base else 0

2. Grouping pd.DataFrame by security column

Next, you should call .groupby('security') to group pd.DataFrame by security column, in order to prepare data for group calculations.

grouped_df = df.groupby('security')
security date price change
security
AAPL 3 AAPL 1/1/2016 1000 NaN
4 AAPL 1/2/2016 980 -0.02
5 AAPL 1/3/2016 1050 0.05
6 AAPL 1/4/2016 1070 0.07
IBM 0 IBM 1/1/2016 100 NaN
1 IBM 1/2/2016 102 0.02
2 IBM 1/3/2016 108 0.08

3. Using a expanding window operation on the resulting price column

Next, you should use a expanding window operation by calling .expanding(min_periods=2) on the price groups. This will iterate each price group and get you all data up to the current iteration as a pd.Series. You call .expanding(min_periods=n) to set n as the minimum number of observations required to return a value on each iteration (or Nan, if else). In your case, given that you required NaN on the first record as it is not compared to anything else, you should pass min_periods=2; if you rather have 0 as a result instead, pass min_periods=1.

windows = grouped_df['price'].expanding(min_periods=2)

4. Calling apply on the expanding window operation and passing your custom variation function defined in step 1 as argument

Calling .apply(get_variation) will apply your custom variation formula to each resulting window and return the result.

grouped_variations = windows.apply(get_variation)
security
AAPL 3 NaN
4 -0.02
5 0.05
6 0.07
IBM 0 NaN
1 0.02
2 0.08

5. Dropping outer index of resulting series

As you can see on .4, data is presented with a multi-index. We get rid of the outer index level ('AAPL', 'IBM') by calling .droplevel(0), to prepare the data for merging it correctly into the original dataframe.

variations = grouped_variations.droplevel(0)

6. Assigning previous result to original pd.DataFrame

Finally, we assign the price variations into the original dataframe by calling df.assign. Data will be joined into the destination on its index.

df = df.assign(change=variations)
security date price change
0 IBM 1/1/2016 100 NaN
1 IBM 1/2/2016 102 0.02
2 IBM 1/3/2016 108 0.08
3 AAPL 1/1/2016 1000 NaN
4 AAPL 1/2/2016 980 -0.02
5 AAPL 1/3/2016 1050 0.05
6 AAPL 1/4/2016 1070 0.07
like image 32
Gonzalo Ramírez Muñoz de Toro Avatar answered Sep 24 '22 17:09

Gonzalo Ramírez Muñoz de Toro