Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas pivot table arrangement no aggregation

I want to pivot a pandas dataframe without aggregation, and instead of presenting the pivot index column vertically I want to present it horizontally. I tried with pd.pivot_table but I'm not getting exactly what I wanted.

data = {'year': [2011, 2011, 2012, 2013, 2013],
        'A': [10, 21, 20, 10, 39],
        'B': [12, 45, 19, 10, 39]}

df = pd.DataFrame(data)
print df
    A   B  year
0  10  12  2011
1  21  45  2011
2  20  19  2012
3  10  10  2013
4  39  39  2013

But I want to have:

year      2011     2012      2013
cols     A    B   A    B    A    B
0       10    12  20   19   10   10
1       21    45  NaN  NaN  39   39
like image 207
DougKruger Avatar asked Jul 27 '16 07:07

DougKruger


People also ask

How do you not aggregate in a pivot table?

Inside the Pivot Column dialog, select the column with the values that will populate the new columns to be created. In this case "Time" but could be any field type, including text. In the Advanced Options part, select "Don´t Aggregate" so the values will displayed without any modification.

Can we use pivot without aggregate function in SQL Server?

The answer is no: PIVOT requires aggregation.

What is the difference between pivoting and unstacking?

The process of stacking pivots a level of column labels to the row index. Unstacking performs the opposite, pivoting a level of the row index into the column index.

What is the difference between pivot table and Groupby in pandas?

What is the difference between the pivot_table and the groupby? The groupby method is generally enough for two-dimensional operations, but pivot_table is used for multi-dimensional grouping operations.


2 Answers

groupby('year') so I can reset_index to get index values of 0 and 1. Then do a bunch of clean up.

df.groupby('year')['A', 'B'] \
    .apply(lambda df: df.reset_index(drop=True)) \
    .unstack(0).swaplevel(0, 1, 1).sort_index(1)

enter image description here

like image 141
piRSquared Avatar answered Oct 24 '22 16:10

piRSquared


You can first create column for new index by cumcount, then stack with unstack:

df['g'] = df.groupby('year')['year'].cumcount()
df1 = df.set_index(['g','year']).stack().unstack([1,2])
print (df1)

year  2011        2012        2013      
         A     B     A     B     A     B
g                                       
0     10.0  12.0  20.0  19.0  10.0  10.0
1     21.0  45.0   NaN   NaN  39.0  39.0

If need set columns names use rename_axis (new in pandas 0.18.0):

df['g'] = df.groupby('year')['year'].cumcount()
df1 = df.set_index(['g','year'])
        .stack()
        .unstack([1,2])
        .rename_axis(None)
        .rename_axis(('year','cols'), axis=1)
print (df1)
year  2011        2012        2013      
cols     A     B     A     B     A     B
0     10.0  12.0  20.0  19.0  10.0  10.0
1     21.0  45.0   NaN   NaN  39.0  39.0

Another solution with pivot, but you need swap first and second level of Multiindex in columns by swaplevel and then sort it by sort_index:

df['g'] = df.groupby('year')['year'].cumcount()
df1 = df.pivot(index='g', columns='year')
df1 = df1.swaplevel(0,1, axis=1).sort_index(axis=1)
print (df1)
year  2011        2012        2013      
         A     B     A     B     A     B
g                                       
0     10.0  12.0  20.0  19.0  10.0  10.0
1     21.0  45.0   NaN   NaN  39.0  39.0
print (df1)

year  2011        2012        2013      
         A     B     A     B     A     B
g                                       
0     10.0  12.0  20.0  19.0  10.0  10.0
1     21.0  45.0   NaN   NaN  39.0  39.0
like image 31
jezrael Avatar answered Oct 24 '22 14:10

jezrael