How to calculate percent change compared to the beginning value using pandas?

Question

I have a DataFrame and need to calculate percent change compared to the beginning of the year by companies. Is there any way to use pct_change() or other method to perform this task? Thanks!

df looks like

security    date        price
IBM         1/1/2016    100
IBM         1/2/2016    102
IBM         1/3/2016    108
AAPL        1/1/2016    1000
AAPL        1/2/2016    980
AAPL        1/3/2016    1050
AAPL        1/4/2016    1070

results I want

security    date        price   change
IBM         1/1/2016    100     NA
IBM         1/2/2016    102     2%
IBM         1/3/2016    108     8%
AAPL        1/1/2016    1000    NA
AAPL        1/2/2016    980     -2%
AAPL        1/3/2016    1050    5%
AAPL        1/4/2016    1070    7%

Stefan · Accepted Answer

Sounds like you are looking for an expanding_window version of pct_change(). This doesn't exist out of the box AFAIK, but you could roll your own:

df.groupby('security')['price'].apply(lambda x: x.div(x.iloc[0]).subtract(1).mul(100))

fivetentaylor · Answer

This works, assuming you're already ordered by date within each possible grouping.

def pct_change(df):
    df['pct'] = 100 * (1 - df.iloc[0].price / df.price)
    return df

df.groupby('security').apply(pct_change)

Gonzalo Ramírez Muñoz de Toro · Answer

Late to the party, but I faced a similar problem and I'd like to share my solution with you, in case it's helpful for anybody.

TL; DR

def get_variation(values: pd.Series) -> np.float64:
    base = values.iloc[0]  # first element in window iteration
    current = values.iloc[-1]  # last element in window iteration

    return (current - base) / base if base else 0  # avoid ZeroDivisionError

variations = df.groupby('security')['price'].expanding(min_periods=2).apply(get_variation)
df = df.assign(change=variations.droplevel(0))

	security	date	price	change
0	IBM	1/1/2016	100	NaN
1	IBM	1/2/2016	102	0.02
2	IBM	1/3/2016	108	0.08
3	AAPL	1/1/2016	1000	NaN
4	AAPL	1/2/2016	980	-0.02
5	AAPL	1/3/2016	1050	0.05
6	AAPL	1/4/2016	1070	0.07

You can do what you are asking by:

defining your own variation function,
grouping pd.DataFrame by security column,
using an expanding window operation on the resulting price column,
calling apply on the expanding window operation and passing your custom variation function defined in step 1 as argument,
dropping outer index of resulting series,
assigning previous result to original pd.DataFrame.

Optionally, you can replace the expanding window operation in step 3 with a rolling window operation by calling .rolling(window=2, min_periods=2), to get a step-by-step variation on each security price. window=2 defines the size of the window in order to return two elements on each iteration and min_periods=2 sets the minimum data needed for calculations (will result in NaN, if else).

Step by step

1. Defining your own variation function

Your variation function should accept a pd.Series as argument and use the first and last items in the collection to calculate the variation. Here, I'm using a standard variation function used in finance to calculate interest rates. The last line has an if/else statement to avoid ZeroDivisionError.

def get_variation(values: pd.Series) -> np.float64:
    base = values.iloc[0]
    current = values.iloc[-1]

    return (current - base) / base if base else 0

2. Grouping `pd.DataFrame` by `security` column

Next, you should call .groupby('security') to group pd.DataFrame by security column, in order to prepare data for group calculations.

grouped_df = df.groupby('security')

		security	date	price	change
security
AAPL	3	AAPL	1/1/2016	1000	NaN
	4	AAPL	1/2/2016	980	-0.02
	5	AAPL	1/3/2016	1050	0.05
	6	AAPL	1/4/2016	1070	0.07
IBM	0	IBM	1/1/2016	100	NaN
	1	IBM	1/2/2016	102	0.02
	2	IBM	1/3/2016	108	0.08

3. Using a expanding window operation on the resulting `price` column

Next, you should use a expanding window operation by calling .expanding(min_periods=2) on the price groups. This will iterate each price group and get you all data up to the current iteration as a pd.Series. You call .expanding(min_periods=n) to set n as the minimum number of observations required to return a value on each iteration (or Nan, if else). In your case, given that you required NaN on the first record as it is not compared to anything else, you should pass min_periods=2; if you rather have 0 as a result instead, pass min_periods=1.

windows = grouped_df['price'].expanding(min_periods=2)

4. Calling `apply` on the expanding window operation and passing your custom variation function defined in step 1 as argument

Calling .apply(get_variation) will apply your custom variation formula to each resulting window and return the result.

grouped_variations = windows.apply(get_variation)

security
AAPL	3	NaN
	4	-0.02
	5	0.05
	6	0.07
IBM	0	NaN
	1	0.02
	2	0.08

5. Dropping outer index of resulting series

As you can see on .4, data is presented with a multi-index. We get rid of the outer index level ('AAPL', 'IBM') by calling .droplevel(0), to prepare the data for merging it correctly into the original dataframe.

variations = grouped_variations.droplevel(0)

6. Assigning previous result to original `pd.DataFrame`

Finally, we assign the price variations into the original dataframe by calling df.assign. Data will be joined into the destination on its index.

df = df.assign(change=variations)

	security	date	price	change
0	IBM	1/1/2016	100	NaN
1	IBM	1/2/2016	102	0.02
2	IBM	1/3/2016	108	0.08
3	AAPL	1/1/2016	1000	NaN
4	AAPL	1/2/2016	980	-0.02
5	AAPL	1/3/2016	1050	0.05
6	AAPL	1/4/2016	1070	0.07

How to calculate percent change compared to the beginning value using pandas?

Tags:

python

pandas

dataframe

percentage

E.K.

3 Answers

Stefan

fivetentaylor

TL; DR

Step by step

1. Defining your own variation function

2. Grouping `pd.DataFrame` by `security` column

3. Using a expanding window operation on the resulting `price` column

4. Calling `apply` on the expanding window operation and passing your custom variation function defined in step 1 as argument

5. Dropping outer index of resulting series

6. Assigning previous result to original `pd.DataFrame`

Gonzalo Ramírez Muñoz de Toro

Recent Activity

Donate For Us

How to calculate percent change compared to the beginning value using pandas?

Tags:

python

pandas

dataframe

percentage

E.K.

3 Answers

Stefan

fivetentaylor

TL; DR

Step by step

1. Defining your own variation function

2. Grouping pd.DataFrame by security column

3. Using a expanding window operation on the resulting price column

4. Calling apply on the expanding window operation and passing your custom variation function defined in step 1 as argument

5. Dropping outer index of resulting series

6. Assigning previous result to original pd.DataFrame

Gonzalo Ramírez Muñoz de Toro

Related questions

Recent Activity

Donate For Us

2. Grouping `pd.DataFrame` by `security` column

3. Using a expanding window operation on the resulting `price` column

4. Calling `apply` on the expanding window operation and passing your custom variation function defined in step 1 as argument

6. Assigning previous result to original `pd.DataFrame`