Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reshaping/Pivoting Data with Date Value

I need to pivot/reshape long form data 2 ways: 1) adding date columns(End-of_month) and filling in numeric value (total) 2) adding date columns(End-of_month) and filling in date value(day-of-month that reached the 'total' value in previous pivot)

I can do 1 with:

data = pd.DataFrame({'date': ['1-12-2016', '1-23-2016', '2-23-2016', '2-1-2016', '3-4-2016'],
        'EOM': ['1-31-2016', '1-31-2016', '2-28-2016', '2-28-2016', '3-31-2016'],
        'country':['uk', 'usa', 'fr','fr','uk'],
        'tr_code': [10, 21, 20, 10,12],
        'TOTAL': [435, 367,891,1234,231]
        })

data['EOM'] = pd.to_datetime(data['EOM'])
data['date'] = pd.to_datetime(data['date'])


data_total = data.pivot_table(values='TOTAL', index=['country','tr_code'], columns='EOM')

Out[73]: 
EOM              2016-01-31  2016-02-28  2016-03-31
country tr_code                                    
fr      10              NaN      1234.0         NaN
        20              NaN       891.0         NaN
uk      10            435.0         NaN         NaN
        12              NaN         NaN       231.0
usa     21            367.0         NaN         NaN

However, trying to change value argument with 'date' produces: DataError: No numeric types to aggregate

I basically want two df's - the one I accomplished, and another in the same format , but instead of the 'TOTAL' value the 'date' in which that total was accomplished.

Any help is greatly appreciated.

like image 727
HowdyDude Avatar asked Apr 08 '26 20:04

HowdyDude


1 Answers

set_index with unstack

This assumes the combinations of ['country', 'tr_code', 'EOM'] are unique and will break if they are not. This is why an aggregation function is important. We need a rule if and when we get multiple observations of a combination.

data.set_index(['country', 'tr_code', 'EOM']).date.unstack()

EOM             2016-01-31 2016-02-28 2016-03-31
country tr_code                                 
fr      10             NaT 2016-02-01        NaT
        20             NaT 2016-02-23        NaT
uk      10      2016-01-12        NaT        NaT
        12             NaT        NaT 2016-03-04
usa     21      2016-01-23        NaT        NaT

aggfunc / pivot_table

The default aggregation function is mean and that makes no sense for dates. first will do. Could also have used last which ALollz had used in their deleted answer.

data.pivot_table(
    values='date', index=['country', 'tr_code'], columns='EOM', aggfunc='first')

EOM             2016-01-31 2016-02-28 2016-03-31
country tr_code                                 
fr      10             NaT 2016-02-01        NaT
        20             NaT 2016-02-23        NaT
uk      10      2016-01-12        NaT        NaT
        12             NaT        NaT 2016-03-04
usa     21      2016-01-23        NaT        NaT

groupby

Less glamorous way of doing the same thing as pivot_table

data.groupby(['country', 'tr_code', 'EOM']).date.first().unstack()

EOM             2016-01-31 2016-02-28 2016-03-31
country tr_code                                 
fr      10             NaT 2016-02-01        NaT
        20             NaT 2016-02-23        NaT
uk      10      2016-01-12        NaT        NaT
        12             NaT        NaT 2016-03-04
usa     21      2016-01-23        NaT        NaT
like image 170
piRSquared Avatar answered Apr 11 '26 20:04

piRSquared