I have a DataFrame
and need to calculate percent change compared to the beginning of the year by companies. Is there any way to use pct_change()
or other method to perform this task? Thanks!
df
looks like
security date price
IBM 1/1/2016 100
IBM 1/2/2016 102
IBM 1/3/2016 108
AAPL 1/1/2016 1000
AAPL 1/2/2016 980
AAPL 1/3/2016 1050
AAPL 1/4/2016 1070
results I want
security date price change
IBM 1/1/2016 100 NA
IBM 1/2/2016 102 2%
IBM 1/3/2016 108 8%
AAPL 1/1/2016 1000 NA
AAPL 1/2/2016 980 -2%
AAPL 1/3/2016 1050 5%
AAPL 1/4/2016 1070 7%
Sounds like you are looking for an expanding_window
version of pct_change()
. This doesn't exist out of the box AFAIK, but you could roll your own:
df.groupby('security')['price'].apply(lambda x: x.div(x.iloc[0]).subtract(1).mul(100))
This works, assuming you're already ordered by date within each possible grouping.
def pct_change(df):
df['pct'] = 100 * (1 - df.iloc[0].price / df.price)
return df
df.groupby('security').apply(pct_change)
Late to the party, but I faced a similar problem and I'd like to share my solution with you, in case it's helpful for anybody.
def get_variation(values: pd.Series) -> np.float64:
base = values.iloc[0] # first element in window iteration
current = values.iloc[-1] # last element in window iteration
return (current - base) / base if base else 0 # avoid ZeroDivisionError
variations = df.groupby('security')['price'].expanding(min_periods=2).apply(get_variation)
df = df.assign(change=variations.droplevel(0))
security | date | price | change | |
---|---|---|---|---|
0 | IBM | 1/1/2016 | 100 | NaN |
1 | IBM | 1/2/2016 | 102 | 0.02 |
2 | IBM | 1/3/2016 | 108 | 0.08 |
3 | AAPL | 1/1/2016 | 1000 | NaN |
4 | AAPL | 1/2/2016 | 980 | -0.02 |
5 | AAPL | 1/3/2016 | 1050 | 0.05 |
6 | AAPL | 1/4/2016 | 1070 | 0.07 |
You can do what you are asking by:
pd.DataFrame
by security
column,price
column,apply
on the expanding window operation and passing your custom variation function defined in step 1 as argument,pd.DataFrame
.Optionally, you can replace the expanding window operation in step 3 with a rolling window operation by calling .rolling(window=2, min_periods=2)
, to get a step-by-step variation on each security price. window=2
defines the size of the window in order to return two elements on each iteration and min_periods=2
sets the minimum data needed for calculations (will result in NaN
, if else).
Your variation function should accept a pd.Series
as argument and use the first and last items in the collection to calculate the variation. Here, I'm using a standard variation function used in finance to calculate interest rates. The last line has an if/else
statement to avoid ZeroDivisionError
.
def get_variation(values: pd.Series) -> np.float64:
base = values.iloc[0]
current = values.iloc[-1]
return (current - base) / base if base else 0
pd.DataFrame
by security
columnNext, you should call .groupby('security')
to group pd.DataFrame
by security
column, in order to prepare data for group calculations.
grouped_df = df.groupby('security')
security | date | price | change | ||
---|---|---|---|---|---|
security | |||||
AAPL | 3 | AAPL | 1/1/2016 | 1000 | NaN |
4 | AAPL | 1/2/2016 | 980 | -0.02 | |
5 | AAPL | 1/3/2016 | 1050 | 0.05 | |
6 | AAPL | 1/4/2016 | 1070 | 0.07 | |
IBM | 0 | IBM | 1/1/2016 | 100 | NaN |
1 | IBM | 1/2/2016 | 102 | 0.02 | |
2 | IBM | 1/3/2016 | 108 | 0.08 |
price
columnNext, you should use a expanding window operation by calling .expanding(min_periods=2)
on the price groups. This will iterate each price
group and get you all data up to the current iteration as a pd.Series
. You call .expanding(min_periods=n)
to set n
as the minimum number of observations required to return a value on each iteration (or Nan
, if else). In your case, given that you required NaN
on the first record as it is not compared to anything else, you should pass min_periods=2
; if you rather have 0
as a result instead, pass min_periods=1
.
windows = grouped_df['price'].expanding(min_periods=2)
apply
on the expanding window operation and passing your custom variation function defined in step 1 as argumentCalling .apply(get_variation)
will apply your custom variation formula to each resulting window and return the result.
grouped_variations = windows.apply(get_variation)
security | ||
---|---|---|
AAPL | 3 | NaN |
4 | -0.02 | |
5 | 0.05 | |
6 | 0.07 | |
IBM | 0 | NaN |
1 | 0.02 | |
2 | 0.08 |
As you can see on .4, data is presented with a multi-index. We get rid of the outer index level ('AAPL', 'IBM') by calling .droplevel(0)
, to prepare the data for merging it correctly into the original dataframe.
variations = grouped_variations.droplevel(0)
pd.DataFrame
Finally, we assign the price variations into the original dataframe by calling df.assign
. Data will be joined into the destination on its index.
df = df.assign(change=variations)
security | date | price | change | |
---|---|---|---|---|
0 | IBM | 1/1/2016 | 100 | NaN |
1 | IBM | 1/2/2016 | 102 | 0.02 |
2 | IBM | 1/3/2016 | 108 | 0.08 |
3 | AAPL | 1/1/2016 | 1000 | NaN |
4 | AAPL | 1/2/2016 | 980 | -0.02 |
5 | AAPL | 1/3/2016 | 1050 | 0.05 |
6 | AAPL | 1/4/2016 | 1070 | 0.07 |
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With