Pandas: transform column's values in independent columns

Tags:

python

pandas

I have Pandas DataFrame which looks like following (df_olymic). I would like the values of column Type to be transformed in independent columns (df_olympic_table)

Original dataframe

In [3]: df_olympic
Out[3]: 
   Country    Type Num
0      USA    Gold  46
1      USA  Silver  37
2      USA  Bronze  38
3       GB    Gold  27
4       GB  Silver  23
5       GB  Bronze  17
6    China    Gold  26
7    China  Silver  18
8    China  Bronze  26
9   Russia    Gold  19
10  Russia  Silver  18
11  Russia  Bronze  19

Transformed dataframe

In [5]: df_olympic_table
Out[5]: 
  Country N_Gold N_Silver N_Bronze
0     USA     46       37       38
1      GB     27       23       17
2   China     26       18       26
3  Russia     19       18       19

What would be the most convenient way to achieve this?

565

asked Jan 08 '17 10:01

TruLa

1 Answers

You can use DataFrame.pivot:

df = df.pivot(index='Country', columns='Type', values='Num')
print (df)
Type     Bronze  Gold  Silver
Country                      
China        26    26      18
GB           17    27      23
Russia       19    19      18
USA          38    46      37

Another solution with DataFrame.set_index and Series.unstack:

df = df.set_index(['Country','Type'])['Num'].unstack()
print (df)
Type     Bronze  Gold  Silver
Country                      
China        26    26      18
GB           17    27      23
Russia       19    19      18
USA          38    46      37

but if get:

ValueError: Index contains duplicate entries, cannot reshape

need pivot_table with some aggreagte function, by default it is np.mean, but you can use sum, first...

#add new row with duplicates value in 'Country' and 'Type'
print (df)
   Country    Type  Num
0      USA    Gold   46
1      USA  Silver   37
2      USA  Bronze   38
3       GB    Gold   27
4       GB  Silver   23
5       GB  Bronze   17
6    China    Gold   26
7    China  Silver   18
8    China  Bronze   26
9   Russia    Gold   19
10  Russia  Silver   18
11  Russia  Bronze   20 < - changed value to 20
11  Russia  Bronze  100 < - add new row with duplicates


df = df.pivot_table(index='Country', columns='Type', values='Num', aggfunc=np.mean)
print (df)
Type     Bronze  Gold  Silver
Country                      
China        26    26      18
GB           17    27      23
Russia       60    19      18 < - Russia get ((100 + 20)/ 2 = 60
USA          38    46      37

Or groupby with aggreagting mean and reshape by unstack:

df = df.groupby(['Country','Type'])['Num'].mean().unstack()
print (df)
Type     Bronze  Gold  Silver
Country                      
China        26    26      18
GB           17    27      23
Russia       60    19      18 < - Russia get ((100 + 20)/ 2 = 60
USA          38    46      37

179

answered Nov 09 '22 13:11

jezrael

Related questions
                            
                                how to load a json into a pandas dataframe?
                            
                                uniquify an array/list with a tolerance in python (uniquetol equivalent)
                            
                                Dynamodb max value
                            
                                How can I elegantly find the next and previous value in a Python Enum? [duplicate]
                            
                                Python Spark / Yarn memory usage
                            
                                pandas read excel as formatted
                            
                                Unresolved reference in Django's docstring in PyCharm
                            
                                Pandas DataFrame select the specific columns with NaN values
                            
                                Can't install Pillow for Python 3.x in Windows - Zlib is required
                            
                                Python numpy filter two-dimensional array by condition
                            
                                Pandas pop last row
                            
                                Create new shapely polygon by subtracting the intersection with another polygon
                            
                                Python multiprocessing - check status of each processes
                            
                                How to use cross_val_score with random_state
                            
                                How to add an initial/default value using Django Filters?
                            
                                How can I add rows for all dates between two columns?
                            
                                what is top level module in Python?
                            
                                python 3.5 asyncio and aiohttp Errno 101 Network is unreachable
                            
                                CNTK Complaining about Dynamic Axis in LSTM
                            
                                How to truncate decimal type & preserve as decimal type without rounding?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With