When stacking a pandas DataFrame
, a Series
is returned. Normally after I stack a DataFrame
, I convert it back into a DataFrame
. However, the default names coming from the stacked data make renaming the columns a bit hacky. What I'm looking for is an easier/built-in way to give columns sensible names after stacking.
E.g., for the following DataFrame
:
In [64]: df = pd.DataFrame({'id':[1,2,3], ...: 'date':['2015-09-31']*3, ...: 'value':[100, 95, 42], ...: 'value2':[200, 57, 27]}).set_index(['id','date']) In [65]: df Out[65]: value value2 id date 1 2015-09-31 100 200 2 2015-09-31 95 57 3 2015-09-31 42 27
I stack and convert it back to a DataFrame
like so:
In [68]: df.stack().reset_index() Out[68]: id date level_2 0 0 1 2015-09-31 value 100 1 1 2015-09-31 value2 200 2 2 2015-09-31 value 95 3 2 2015-09-31 value2 57 4 3 2015-09-31 value 42 5 3 2015-09-31 value2 27
So in order to name these columns appropriately I would need to do something like this:
In [72]: stacked = df.stack() In [73]: stacked Out[73]: id date 1 2015-09-31 value 100 value2 200 2 2015-09-31 value 95 value2 57 3 2015-09-31 value 42 value2 27 dtype: int64 In [74]: stacked.index.set_names('var_name', level=len(stacked.index.names)-1, inplace=True) In [88]: stacked.reset_index().rename(columns={0:'value'}) Out[88]: id date var_name value 0 1 2015-09-31 value 100 1 1 2015-09-31 value2 200 2 2 2015-09-31 value 95 3 2 2015-09-31 value2 57 4 3 2015-09-31 value 42 5 3 2015-09-31 value2 27
Ideally, the solution would look something like this:
df.stack(new_index_name='var_name', new_col_name='value')
But looking at the docs it doesn't look like stack
takes any such arguments. Is there an easier/built-in way in pandas to deal with this workflow?
So here's one way that you may find a bit cleaner, using the fact that columns
and Series
can also carry names.
In [45]: df Out[45]: value value2 id date 1 2015-09-31 100 200 2 2015-09-31 95 57 3 2015-09-31 42 27 In [46]: df.columns.name = 'var_name' In [47]: s = df.stack() In [48]: s.name = 'value' In [49]: s.reset_index() Out[49]: id date var_name value 0 1 2015-09-31 value 100 1 1 2015-09-31 value2 200 2 2 2015-09-31 value 95 3 2 2015-09-31 value2 57 4 3 2015-09-31 value 42 5 3 2015-09-31 value2 27
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With