Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a new column for each unique component in a given column of a dataframe in Pandas?

I am relatively new to Pandas so my sincere apologies if the question was not framed properly. I have the following dataframe

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',
                         'foo', 'bar', 'foo', 'foo'],
                   'B': ['one', 'one', 'two', 'three',
                         'two', 'two', 'one', 'three'],
                   'C': np.random.randn(8)})



     A      B         C         
0  foo    one  0.469112 
1  bar    one -0.282863 
2  foo    two -1.509059
3  bar  three -1.135632  
4  foo    two  1.212112  
5  bar    two -0.173215 
6  foo    one  0.119209 
7  foo  three -1.044236 

What I want to achieve is following,

           foo_B         foo_C      bar_B      bar_C          
0             one        0.469112     -           -
1             -            -          one        -0.282863 
2             two        -1.509059    -            -
3             -               -       three    -1.135632               
4             two         1.212112    -            -
5              -              -       two      -0.173215 
6             one         0.119209      -           -
7              three     -1.044236      -           -

I exactly don't know which pandas function to use to obtain such a result. Kindly help

like image 932
mubas007 Avatar asked Apr 15 '20 21:04

mubas007


People also ask

How do I make unique columns in pandas?

You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.

How do you create a new column derived from existing columns?

To create a new column, use the [] brackets with the new column name at the left side of the assignment.

How do I get unique column values from a DataFrame?

To get unique values from a column in a DataFrame, use the unique(). To count the unique values from a column in a DataFrame, use the nunique().


1 Answers

you can do it with set_index the column A with append=True to keep the original index, and unstack. Then rename the columns as wanted in your output.

df_f = df.set_index('A', append=True).unstack()
df_f.columns = [f'{col[1]}_{col[0]}' for col in df_f.columns]
print (df_f)
   bar_B  foo_B     bar_C     foo_C
0    NaN    one       NaN -0.230467
1    one    NaN  0.230529       NaN
2    NaN    two       NaN  1.633847
3  three    NaN -0.307068       NaN
4    NaN    two       NaN  0.130438
5    two    NaN  0.459630       NaN
6    NaN    one       NaN -0.791269
7    NaN  three       NaN  0.016670
like image 148
Ben.T Avatar answered Sep 28 '22 00:09

Ben.T