Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: break categorical column to multiple columns

Imagine a Pandas dataframe of the following format:

id  type  v1  v2
1   A     6   9
1   B     4   2
2   A     3   7
2   B     3   6

I would like to convert this dataframe into the following format:

id  A_v1  A_v2  B_v1  B_v2
1   6     9     4     2
2   3     7     3     6

Is there an elegant way of doing this?

like image 681
Logister Avatar asked Jun 22 '15 21:06

Logister


1 Answers

You could use set_index to move the type and id columns into the index, and then unstack to move the type index level into the column index. You don't have to worry about the v values -- where the indexes go dictate the arrangement of the values.

The result is a DataFrame with a MultiIndex for the column index:

In [181]: df.set_index(['type', 'id']).unstack(['type'])
Out[181]: 
     v1    v2   
type  A  B  A  B
id              
1     6  4  9  2
2     3  3  7  6

Generally, a MultiIndex is preferable to a flattened column index. It gives you better ways to select or manipulate your data based on type or v value.

If you wish to reorder the columns to exactly match the order shown in the desired output, you could use df.reindex:

df = df.reindex(columns=sorted(df.columns, key=lambda x: x[::-1]))

yields

     v1 v2 v1 v2
type  A  A  B  B
id              
1     6  9  4  2
2     3  7  3  6

And if you wish to flatten the column index to a single level, then

df.columns = ['{}_{}'.format(t, v) for v,t in df.columns]

yields

    A_v1  A_v2  B_v1  B_v2
id                        
1      6     9     4     2
2      3     7     3     6
like image 154
unutbu Avatar answered Oct 04 '22 08:10

unutbu