My objective is simple but not sure if it's possible. Reproducible example:
Can you go from this:
raw_data = {'score': [1, 3, 4, 4, 1, 2, 2, 4, 4, 2],
'player': ['Miller', 'Jacobson', 'Ali', 'George', 'Cooze', 'Wilkinson', 'Lewis', 'Lewis', 'Lewis', 'Jacobson']}
df = pd.DataFrame(raw_data, columns = ['score', 'player'])
df
score player
0 1 Miller
1 3 Jacobson
2 4 Ali
3 4 George
4 1 Cooze
5 2 Wilkinson
6 2 Lewis
7 4 Lewis
8 4 Lewis
9 2 Jacobson
To this:
score col_1 col_2 col_3 col_4
score
1 2 Miller Cooze n/a n/a
2 3 Wilkinson Lewis Jacobson n/a
3 1 Jacobson n/a n/a n/a
4 4 Ali George Lewis Lewis
Via a groupby
?
I can get this far df.groupby(['score']).agg({'score': np.size})
but can't work out how to create the new columns with the column values.
Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well.
count() should be used when you want to find the frequency of valid values present in columns with respect to specified col . . value_counts() should be used to find the frequencies of a series.
I can duplicate your output with
Option 1
g = df.groupby('score').player
g.size().to_frame('score').join(g.apply(list).apply(pd.Series).add_prefix('col_'))
score col_0 col_1 col_2 col_3
score
1 2 Miller Cooze NaN NaN
2 3 Wilkinson Lewis Jacobson NaN
3 1 Jacobson NaN NaN NaN
4 4 Ali George Lewis Lewis
Option 2
d1 = df.groupby('score').agg({'score': 'size', 'player': lambda x: tuple(x)})
d1.join(pd.DataFrame(d1.pop('player').values.tolist()).add_prefix('col_'))
score col_0 col_1 col_2 col_3
score
1 2 Miller Cooze NaN NaN
2 3 Wilkinson Lewis Jacobson NaN
3 1 Jacobson NaN NaN NaN
4 4 Ali George Lewis Lewis
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With