I need to pivot/reshape long form data 2 ways: 1) adding date columns(End-of_month) and filling in numeric value (total) 2) adding date columns(End-of_month) and filling in date value(day-of-month that reached the 'total' value in previous pivot)
I can do 1 with:
data = pd.DataFrame({'date': ['1-12-2016', '1-23-2016', '2-23-2016', '2-1-2016', '3-4-2016'],
'EOM': ['1-31-2016', '1-31-2016', '2-28-2016', '2-28-2016', '3-31-2016'],
'country':['uk', 'usa', 'fr','fr','uk'],
'tr_code': [10, 21, 20, 10,12],
'TOTAL': [435, 367,891,1234,231]
})
data['EOM'] = pd.to_datetime(data['EOM'])
data['date'] = pd.to_datetime(data['date'])
data_total = data.pivot_table(values='TOTAL', index=['country','tr_code'], columns='EOM')
Out[73]:
EOM 2016-01-31 2016-02-28 2016-03-31
country tr_code
fr 10 NaN 1234.0 NaN
20 NaN 891.0 NaN
uk 10 435.0 NaN NaN
12 NaN NaN 231.0
usa 21 367.0 NaN NaN
However, trying to change value argument with 'date' produces: DataError: No numeric types to aggregate
I basically want two df's - the one I accomplished, and another in the same format , but instead of the 'TOTAL' value the 'date' in which that total was accomplished.
Any help is greatly appreciated.
set_index with unstackThis assumes the combinations of ['country', 'tr_code', 'EOM'] are unique and will break if they are not. This is why an aggregation function is important. We need a rule if and when we get multiple observations of a combination.
data.set_index(['country', 'tr_code', 'EOM']).date.unstack()
EOM 2016-01-31 2016-02-28 2016-03-31
country tr_code
fr 10 NaT 2016-02-01 NaT
20 NaT 2016-02-23 NaT
uk 10 2016-01-12 NaT NaT
12 NaT NaT 2016-03-04
usa 21 2016-01-23 NaT NaT
aggfunc / pivot_tableThe default aggregation function is mean and that makes no sense for dates. first will do. Could also have used last which ALollz had used in their deleted answer.
data.pivot_table(
values='date', index=['country', 'tr_code'], columns='EOM', aggfunc='first')
EOM 2016-01-31 2016-02-28 2016-03-31
country tr_code
fr 10 NaT 2016-02-01 NaT
20 NaT 2016-02-23 NaT
uk 10 2016-01-12 NaT NaT
12 NaT NaT 2016-03-04
usa 21 2016-01-23 NaT NaT
groupbyLess glamorous way of doing the same thing as pivot_table
data.groupby(['country', 'tr_code', 'EOM']).date.first().unstack()
EOM 2016-01-31 2016-02-28 2016-03-31
country tr_code
fr 10 NaT 2016-02-01 NaT
20 NaT 2016-02-23 NaT
uk 10 2016-01-12 NaT NaT
12 NaT NaT 2016-03-04
usa 21 2016-01-23 NaT NaT
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With