I am relatively new to Pandas so my sincere apologies if the question was not framed properly. I have the following dataframe
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'foo'],
'B': ['one', 'one', 'two', 'three',
'two', 'two', 'one', 'three'],
'C': np.random.randn(8)})
A B C
0 foo one 0.469112
1 bar one -0.282863
2 foo two -1.509059
3 bar three -1.135632
4 foo two 1.212112
5 bar two -0.173215
6 foo one 0.119209
7 foo three -1.044236
What I want to achieve is following,
foo_B foo_C bar_B bar_C
0 one 0.469112 - -
1 - - one -0.282863
2 two -1.509059 - -
3 - - three -1.135632
4 two 1.212112 - -
5 - - two -0.173215
6 one 0.119209 - -
7 three -1.044236 - -
I exactly don't know which pandas function to use to obtain such a result. Kindly help
You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.
To create a new column, use the [] brackets with the new column name at the left side of the assignment.
To get unique values from a column in a DataFrame, use the unique(). To count the unique values from a column in a DataFrame, use the nunique().
you can do it with set_index
the column A with append=True
to keep the original index, and unstack
. Then rename the columns as wanted in your output.
df_f = df.set_index('A', append=True).unstack()
df_f.columns = [f'{col[1]}_{col[0]}' for col in df_f.columns]
print (df_f)
bar_B foo_B bar_C foo_C
0 NaN one NaN -0.230467
1 one NaN 0.230529 NaN
2 NaN two NaN 1.633847
3 three NaN -0.307068 NaN
4 NaN two NaN 0.130438
5 two NaN 0.459630 NaN
6 NaN one NaN -0.791269
7 NaN three NaN 0.016670
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With