Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dividing Pandas Dataframe by Week

I have a dataframe with columns "Date" and "Num".

dates = pd.date_range('1/1/2001','1/1/2003', freq = 'd')
nums = [np.random.randint(100) for x in range(len(dates))]

df = pd.DataFrame({'Dates': dates, 'DOW': dates.strftime('%a'), 'Nums': nums})
df = df[(df.DOW != 'Sat') & (df.DOW !='Sun')]
df = df.drop([7,18]).reset_index(drop = True)

I need to bin the dataframe so I can isolate each week separately. The ultimate goal is to look at the MAX 'Nums' value for each week and compare it to the LAST value of the following week to see what the percent change is. For example:

week1 = df[0:5]
week2 = df[5:9]
week3 = df[9:12]

In [156]: w1max = week1.Nums.max()
Out[156]: 97

In [157]: w2Last = week2.iloc[-1].Nums
Out[157]: 76

pctChange = (w2Last-w1max)/float(w1max)

In [166]: pctChange
Out[166]: -0.21649484536082475

The problem is that some weeks are missing days (eg week2 is missing monday and week3 is missing friday). So how do I separate them out?

The closest thing seems to be using df.resample() but I don't know how to make the comparison I'm trying to make using that.

like image 230
RSHAP Avatar asked Jul 02 '16 21:07

RSHAP


1 Answers

import numpy as np
import pandas as pd
np.random.seed(2016)

dates = pd.date_range('1/1/2001','1/1/2003', freq = 'd')
nums = [np.random.randint(100) for x in range(len(dates))]

df = pd.DataFrame({'Dates': dates, 'DOW': dates.strftime('%a'), 'Nums': nums})
df = df[(df.DOW != 'Sat') & (df.DOW !='Sun')]
df = df.drop([7,18]).reset_index(drop = True)

df2 = df.groupby(pd.Grouper(freq='W', key='Dates'))['Nums'].agg(['max','last'])
df2['previous_max'] = df2['max'].shift(1)
df2['change'] = (df2['last']-df2['previous_max'])/df2['previous_max']
print(df2.head())

yields

            max  last  previous_max    change
Dates                                        
2001-01-07   83    39           NaN       NaN
2001-01-14   75    75          83.0 -0.096386
2001-01-21   97    18          75.0 -0.760000
2001-01-28   72    37          97.0 -0.618557
2001-02-04   84    24          72.0 -0.666667

df.groupby with a pd.Grouper object can be used to group the rows by weeks. You can use the agg method to find both the max and the last value for the Nums in each group:

In [163]: df2 = df.groupby(pd.Grouper(freq='W', key='Dates'))['Nums'].agg(['max','last'])

In [164]: df2.head()
Out[164]: 
            max  last
Dates                
2001-01-07   83    39
2001-01-14   75    75
2001-01-21   97    18
2001-01-28   72    37
2001-02-04   84    24

Then use shift(1) to shift the max values down by one row:

In [165]: df2['previous_max'] = df2['max'].shift(1); df2.head()
Out[165]: 
            max  last  previous_max
Dates                              
2001-01-07   83    39           NaN
2001-01-14   75    75          83.0
2001-01-21   97    18          75.0
2001-01-28   72    37          97.0
2001-02-04   84    24          72.0

Then the percent change can be computed by simple subtraction and division:

In [166]: df2['change'] = (df2['last']-df2['previous_max'])/df2['previous_max']; df2.head()
Out[166]: 
            max  last  previous_max    change
Dates                                        
2001-01-07   83    39           NaN       NaN
2001-01-14   75    75          83.0 -0.096386
2001-01-21   97    18          75.0 -0.760000
2001-01-28   72    37          97.0 -0.618557
2001-02-04   84    24          72.0 -0.666667
like image 137
unutbu Avatar answered Oct 23 '22 23:10

unutbu