I'm not sure where to start with this so apologies for my lack of an attempt.
This is the initial shape of my data:
df = pd.DataFrame({
'Year-Mth': ['1900-01'
,'1901-02'
,'1903-02'
,'1903-03'
,'1903-04'
,'1911-08'
,'1911-09'],
'Category': ['A','A','B','B','B','B','B'],
'SubCategory': ['X','Y','Y','Y','Z','Q','Y'],
'counter': [1,1,1,1,1,1,1]
})
df
This is the result I'd like to get to - the Mth-Year in the below has been resampled to 4 year buckets:
If possible I'd like to do this via a process that makes 'Year-Mth' resamplable - so I can easily switch to different buckets.
Here's my attempt:
df['Year'] = pd.cut(df['Year-Mth'].str[:4].astype(int),
bins=np.arange(1900, 1920, 5), right=False)
df.pivot_table(index=['SubCategory', 'Year'], columns='Category',
values='counter', aggfunc='sum').dropna(how='all').fillna(0)
Out:
Category A B
SubCategory Year
Q [1910, 1915) 0.0 1.0
X [1900, 1905) 1.0 0.0
Y [1900, 1905) 1.0 2.0
[1910, 1915) 0.0 1.0
Z [1900, 1905) 0.0 1.0
The year column is not parameterized as pandas (or numpy) does not offer a cut option with step size, as far as I know. But I think it can be done with a little arithmetic on minimums/maximums. Something like:
df['Year'] = pd.to_datetime(df['Year-Mth']).dt.year
df['Year'] = pd.cut(df['Year'], bins=np.arange(df['Year'].min(),
df['Year'].max() + 5, 5), right=False)
This wouldn't create nice bins like Excel does, though.
cols = [df.SubCategory, pd.to_datetime(df['Year-Mth']), df.Category]
df1 = df.set_index(cols).counter
df1.unstack('Year-Mth').T.resample('60M', how='sum').stack(0).swaplevel(0, 1).sort_index().fillna('')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With