Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Dataframe: Change Index to Year and Columns to Month

Currently I'm working with some data from FRED that I pulled using quandl. The data I receive is in a dataframe.

It currently returns like so

DATE    VALUE
1966-01-01  31.880
1966-02-01  32.080
1966-03-01  32.180
1966-04-01  32.280
... ...
2013-08-01  233.323
2013-09-01  233.632
2013-10-01  233.718
2013-11-01  234.121

But I would like to reorganize for displaying as:

        Jan     Feb     Mar     Apr...
1966    31.880  32.080  32.180  32.280
...
2013    233.323 233.632 233.718 234.121

Edit: I thought this would be useful.

DatetimeIndex(['1966-01-01', '1966-02-01', '1966-03-01', '1966-04-01',
               '1966-05-01', '1966-06-01', '1966-07-01', '1966-08-01',
               '1966-09-01', '1966-10-01',
               ...
               '2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
               '2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
               '2015-12-01', '2016-01-01'],
              dtype='datetime64[ns]', name=u'DATE', length=601, freq=None)
Index([u'VALUE'], dtype='object')
like image 206
user3093175 Avatar asked Dec 06 '25 08:12

user3093175


2 Answers

You can use pandas.crosstab:

df.DATE = pd.to_datetime(df.DATE)    # optional if the DATE column is already datetime type

pd.crosstab(df.DATE.dt.year, 
            df.DATE.dt.strftime("%b"), 
            df.VALUE, 
            aggfunc="sum",
            rownames=["Year"],
            colnames=["Month"])

enter image description here

Update: In case you have Date as index:

pd.crosstab(df.index.year, 
            df.index.strftime("%b"), 
            df.VALUE, 
            aggfunc="sum",
            rownames=["Year"],
            colnames=["Month"])
like image 189
Psidom Avatar answered Dec 08 '25 21:12

Psidom


I like to construct this with pd.Series and unstack

pd.Series(
    df.VALUE.values,
    [df.DATE.dt.year.values,
     df.DATE.dt.strftime("%b").values]
).unstack()

        Apr      Aug    Feb    Jan    Mar      Nov      Oct      Sep
1966  32.28      NaN  32.08  31.88  32.18      NaN      NaN      NaN
2013    NaN  233.323    NaN    NaN    NaN  234.121  233.718  233.632

Or equivalently with

df.set_index(
    [df.DATE.dt.year,
     df.DATE.dt.strftime('%b')],
).VALUE.rename_axis([None, None]).unstack()

naive timing

enter image description here

like image 40
piRSquared Avatar answered Dec 08 '25 21:12

piRSquared