From a dataframe, I want to create a dataframe with new columns if the index is already found BUT I don't know how many columns I will create :
pd.DataFrame([["John","guitar"],["Michael","football"],["Andrew","running"],["John","dancing"],["Andrew","cars"]])
and I want :
pd.DataFrame([["John","guitar","dancing"],["Michael","Football",None],["Andrew","running","cars"]])
without knowing how many columns I should create at the start.
df = pd.DataFrame([["John","guitar"],["Michael","football"],["Andrew","running"],["John","dancing"],["Andrew","cars"]], columns = ['person','hobby'])
You can groupby person
and search for unique
in hobby
. Then use .apply(pd.Series)
to expand lists into columns:
df.groupby('person').hobby.unique().apply(pd.Series).reset_index()
person 0 1
0 Andrew running cars
1 John guitar dancing
2 Michael football NaN
In the case of having a large dataframe, try the more efficient alternative:
df = df.groupby('person').hobby.unique()
df = pd.DataFrame(df.values.tolist(), index=df.index).reset_index()
Which in essence does the same, but avoids looping over rows when applying pd.Series
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With