Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set column names when stacking pandas DataFrame

Tags:

python

pandas

When stacking a pandas DataFrame, a Series is returned. Normally after I stack a DataFrame, I convert it back into a DataFrame. However, the default names coming from the stacked data make renaming the columns a bit hacky. What I'm looking for is an easier/built-in way to give columns sensible names after stacking.

E.g., for the following DataFrame:

In [64]: df = pd.DataFrame({'id':[1,2,3],      ...:                    'date':['2015-09-31']*3,      ...:                    'value':[100, 95, 42],      ...:                    'value2':[200, 57, 27]}).set_index(['id','date'])  In [65]: df Out[65]:                 value  value2 id date                      1  2015-09-31    100     200 2  2015-09-31     95      57 3  2015-09-31     42      27 

I stack and convert it back to a DataFrame like so:

In [68]: df.stack().reset_index() Out[68]:     id        date level_2    0 0   1  2015-09-31   value  100 1   1  2015-09-31  value2  200 2   2  2015-09-31   value   95 3   2  2015-09-31  value2   57 4   3  2015-09-31   value   42 5   3  2015-09-31  value2   27 

So in order to name these columns appropriately I would need to do something like this:

In [72]: stacked = df.stack()  In [73]: stacked Out[73]:  id  date               1   2015-09-31  value     100                 value2    200 2   2015-09-31  value      95                 value2     57 3   2015-09-31  value      42                 value2     27 dtype: int64  In [74]: stacked.index.set_names('var_name', level=len(stacked.index.names)-1, inplace=True)  In [88]: stacked.reset_index().rename(columns={0:'value'}) Out[88]:     id        date var_name  value 0   1  2015-09-31    value    100 1   1  2015-09-31   value2    200 2   2  2015-09-31    value     95 3   2  2015-09-31   value2     57 4   3  2015-09-31    value     42 5   3  2015-09-31   value2     27 

Ideally, the solution would look something like this:

df.stack(new_index_name='var_name', new_col_name='value') 

But looking at the docs it doesn't look like stack takes any such arguments. Is there an easier/built-in way in pandas to deal with this workflow?

like image 342
aensm Avatar asked Oct 07 '15 17:10

aensm


1 Answers

So here's one way that you may find a bit cleaner, using the fact that columns and Series can also carry names.

In [45]: df Out[45]:                 value  value2 id date                      1  2015-09-31    100     200 2  2015-09-31     95      57 3  2015-09-31     42      27  In [46]: df.columns.name = 'var_name'  In [47]: s = df.stack()  In [48]: s.name = 'value'  In [49]: s.reset_index() Out[49]:     id        date var_name  value 0   1  2015-09-31    value    100 1   1  2015-09-31   value2    200 2   2  2015-09-31    value     95 3   2  2015-09-31   value2     57 4   3  2015-09-31    value     42 5   3  2015-09-31   value2     27 
like image 116
chrisb Avatar answered Sep 21 '22 22:09

chrisb