I have a dataframe with columns "Date" and "Num".
dates = pd.date_range('1/1/2001','1/1/2003', freq = 'd')
nums = [np.random.randint(100) for x in range(len(dates))]
df = pd.DataFrame({'Dates': dates, 'DOW': dates.strftime('%a'), 'Nums': nums})
df = df[(df.DOW != 'Sat') & (df.DOW !='Sun')]
df = df.drop([7,18]).reset_index(drop = True)
I need to bin the dataframe so I can isolate each week separately. The ultimate goal is to look at the MAX 'Nums' value for each week and compare it to the LAST value of the following week to see what the percent change is. For example:
week1 = df[0:5]
week2 = df[5:9]
week3 = df[9:12]
In [156]: w1max = week1.Nums.max()
Out[156]: 97
In [157]: w2Last = week2.iloc[-1].Nums
Out[157]: 76
pctChange = (w2Last-w1max)/float(w1max)
In [166]: pctChange
Out[166]: -0.21649484536082475
The problem is that some weeks are missing days (eg week2 is missing monday and week3 is missing friday). So how do I separate them out?
The closest thing seems to be using df.resample()
but I don't know how to make the comparison I'm trying to make using that.
import numpy as np
import pandas as pd
np.random.seed(2016)
dates = pd.date_range('1/1/2001','1/1/2003', freq = 'd')
nums = [np.random.randint(100) for x in range(len(dates))]
df = pd.DataFrame({'Dates': dates, 'DOW': dates.strftime('%a'), 'Nums': nums})
df = df[(df.DOW != 'Sat') & (df.DOW !='Sun')]
df = df.drop([7,18]).reset_index(drop = True)
df2 = df.groupby(pd.Grouper(freq='W', key='Dates'))['Nums'].agg(['max','last'])
df2['previous_max'] = df2['max'].shift(1)
df2['change'] = (df2['last']-df2['previous_max'])/df2['previous_max']
print(df2.head())
yields
max last previous_max change
Dates
2001-01-07 83 39 NaN NaN
2001-01-14 75 75 83.0 -0.096386
2001-01-21 97 18 75.0 -0.760000
2001-01-28 72 37 97.0 -0.618557
2001-02-04 84 24 72.0 -0.666667
df.groupby
with a pd.Grouper
object can be used to group the rows by weeks.
You can use the agg
method to find both the max
and the last
value for the Nums
in each group:
In [163]: df2 = df.groupby(pd.Grouper(freq='W', key='Dates'))['Nums'].agg(['max','last'])
In [164]: df2.head()
Out[164]:
max last
Dates
2001-01-07 83 39
2001-01-14 75 75
2001-01-21 97 18
2001-01-28 72 37
2001-02-04 84 24
Then use shift(1)
to shift the max
values down by one row:
In [165]: df2['previous_max'] = df2['max'].shift(1); df2.head()
Out[165]:
max last previous_max
Dates
2001-01-07 83 39 NaN
2001-01-14 75 75 83.0
2001-01-21 97 18 75.0
2001-01-28 72 37 97.0
2001-02-04 84 24 72.0
Then the percent change can be computed by simple subtraction and division:
In [166]: df2['change'] = (df2['last']-df2['previous_max'])/df2['previous_max']; df2.head()
Out[166]:
max last previous_max change
Dates
2001-01-07 83 39 NaN NaN
2001-01-14 75 75 83.0 -0.096386
2001-01-21 97 18 75.0 -0.760000
2001-01-28 72 37 97.0 -0.618557
2001-02-04 84 24 72.0 -0.666667
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With