Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Complex pivot and resample

Tags:

python

pandas

I'm not sure where to start with this so apologies for my lack of an attempt.

This is the initial shape of my data:

df = pd.DataFrame({
    'Year-Mth': ['1900-01'
                 ,'1901-02'
                 ,'1903-02'
                 ,'1903-03'
                 ,'1903-04'
                 ,'1911-08'
                 ,'1911-09'], 
    'Category': ['A','A','B','B','B','B','B'], 
    'SubCategory': ['X','Y','Y','Y','Z','Q','Y'], 
    'counter': [1,1,1,1,1,1,1]
})

df

This is the result I'd like to get to - the Mth-Year in the below has been resampled to 4 year buckets:

enter image description here

If possible I'd like to do this via a process that makes 'Year-Mth' resamplable - so I can easily switch to different buckets.

like image 671
whytheq Avatar asked Feb 06 '23 04:02

whytheq


2 Answers

Here's my attempt:

df['Year'] = pd.cut(df['Year-Mth'].str[:4].astype(int), 
                    bins=np.arange(1900, 1920, 5), right=False)
df.pivot_table(index=['SubCategory', 'Year'], columns='Category', 
               values='counter', aggfunc='sum').dropna(how='all').fillna(0)
Out: 
Category                    A    B
SubCategory Year                  
Q           [1910, 1915)  0.0  1.0
X           [1900, 1905)  1.0  0.0
Y           [1900, 1905)  1.0  2.0
            [1910, 1915)  0.0  1.0
Z           [1900, 1905)  0.0  1.0

The year column is not parameterized as pandas (or numpy) does not offer a cut option with step size, as far as I know. But I think it can be done with a little arithmetic on minimums/maximums. Something like:

df['Year'] = pd.to_datetime(df['Year-Mth']).dt.year
df['Year'] = pd.cut(df['Year'], bins=np.arange(df['Year'].min(), 
                    df['Year'].max() + 5, 5), right=False)

This wouldn't create nice bins like Excel does, though.

like image 54
ayhan Avatar answered Feb 08 '23 18:02

ayhan


cols = [df.SubCategory, pd.to_datetime(df['Year-Mth']), df.Category]
df1 = df.set_index(cols).counter

df1.unstack('Year-Mth').T.resample('60M', how='sum').stack(0).swaplevel(0, 1).sort_index().fillna('')

enter image description here

like image 44
piRSquared Avatar answered Feb 08 '23 18:02

piRSquared