Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: transform column's values in independent columns

Tags:

python

pandas

I have Pandas DataFrame which looks like following (df_olymic). I would like the values of column Type to be transformed in independent columns (df_olympic_table)

Original dataframe

In [3]: df_olympic
Out[3]: 
   Country    Type Num
0      USA    Gold  46
1      USA  Silver  37
2      USA  Bronze  38
3       GB    Gold  27
4       GB  Silver  23
5       GB  Bronze  17
6    China    Gold  26
7    China  Silver  18
8    China  Bronze  26
9   Russia    Gold  19
10  Russia  Silver  18
11  Russia  Bronze  19

Transformed dataframe

In [5]: df_olympic_table
Out[5]: 
  Country N_Gold N_Silver N_Bronze
0     USA     46       37       38
1      GB     27       23       17
2   China     26       18       26
3  Russia     19       18       19

What would be the most convenient way to achieve this?

like image 565
TruLa Avatar asked Jan 08 '17 10:01

TruLa


People also ask

When to use pandas transform() function?

When to use Pandas transform () function 1 Transforming values. The first argument func is to specify the function to be used for manipulating data. ... 2 Combining groupby () results. One of the most compelling usages of Pandas transform () is combining grouby () results. ... 3 Filtering data 4 Handling missing values at the group level. ...

How do I map a column in a pandas Dataframe?

You can apply the Pandas .map () method can be applied to a Pandas Series, meaning it can be applied to a Pandas DataFrame column. The map function is interesting because it can take three different shapes. This varies depending on what you pass into the method.

How to transform a Dataframe in Python?

We will do the first a few examples using the following sample dataframe. The first and foremost way of transformation is adding or dropping columns. A new column can be added as follows: We give the values as an array or list and assign a name to the new column. Make sure the size of the array is compatible with the size of the dataframe.

How do you handle missing values in pandas transform?

Handling missing values at the group level Another usage of Pandas transform () is to handle missing values at the group level. Let’s see how this works with an example. In the example above, the data can be split into three groups by name, and each group has missing values. A common solution to replace missing values is to replace NaN with mean.


1 Answers

You can use DataFrame.pivot:

df = df.pivot(index='Country', columns='Type', values='Num')
print (df)
Type     Bronze  Gold  Silver
Country                      
China        26    26      18
GB           17    27      23
Russia       19    19      18
USA          38    46      37

Another solution with DataFrame.set_index and Series.unstack:

df = df.set_index(['Country','Type'])['Num'].unstack()
print (df)
Type     Bronze  Gold  Silver
Country                      
China        26    26      18
GB           17    27      23
Russia       19    19      18
USA          38    46      37

but if get:

ValueError: Index contains duplicate entries, cannot reshape

need pivot_table with some aggreagte function, by default it is np.mean, but you can use sum, first...

#add new row with duplicates value in 'Country' and 'Type'
print (df)
   Country    Type  Num
0      USA    Gold   46
1      USA  Silver   37
2      USA  Bronze   38
3       GB    Gold   27
4       GB  Silver   23
5       GB  Bronze   17
6    China    Gold   26
7    China  Silver   18
8    China  Bronze   26
9   Russia    Gold   19
10  Russia  Silver   18
11  Russia  Bronze   20 < - changed value to 20
11  Russia  Bronze  100 < - add new row with duplicates


df = df.pivot_table(index='Country', columns='Type', values='Num', aggfunc=np.mean)
print (df)
Type     Bronze  Gold  Silver
Country                      
China        26    26      18
GB           17    27      23
Russia       60    19      18 < - Russia get ((100 + 20)/ 2 = 60
USA          38    46      37

Or groupby with aggreagting mean and reshape by unstack:

df = df.groupby(['Country','Type'])['Num'].mean().unstack()
print (df)
Type     Bronze  Gold  Silver
Country                      
China        26    26      18
GB           17    27      23
Russia       60    19      18 < - Russia get ((100 + 20)/ 2 = 60
USA          38    46      37
like image 179
jezrael Avatar answered Nov 09 '22 13:11

jezrael