Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Groupby to create table with count and count values

My objective is simple but not sure if it's possible. Reproducible example:

Can you go from this:

raw_data = {'score': [1, 3, 4, 4, 1, 2, 2, 4, 4, 2],
        'player': ['Miller', 'Jacobson', 'Ali', 'George', 'Cooze', 'Wilkinson', 'Lewis', 'Lewis', 'Lewis', 'Jacobson']}
df = pd.DataFrame(raw_data, columns = ['score', 'player'])
df

    score   player
0   1       Miller
1   3       Jacobson
2   4       Ali
3   4       George
4   1       Cooze
5   2       Wilkinson
6   2       Lewis
7   4       Lewis
8   4       Lewis
9   2       Jacobson

To this:

        score    col_1       col_2       col_3       col_4     
score   
1       2        Miller      Cooze       n/a         n/a
2       3        Wilkinson   Lewis       Jacobson    n/a
3       1        Jacobson    n/a         n/a         n/a
4       4        Ali         George      Lewis       Lewis

Via a groupby?

I can get this far df.groupby(['score']).agg({'score': np.size}) but can't work out how to create the new columns with the column values.

like image 205
RDJ Avatar asked May 30 '17 20:05

RDJ


People also ask

How do you do Groupby and count in pandas?

Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well.

What is the difference between Value_counts and count in pandas?

count() should be used when you want to find the frequency of valid values present in columns with respect to specified col . . value_counts() should be used to find the frequencies of a series.


1 Answers

I can duplicate your output with

Option 1

g = df.groupby('score').player
g.size().to_frame('score').join(g.apply(list).apply(pd.Series).add_prefix('col_'))

       score      col_0   col_1     col_2  col_3
score                                           
1          2     Miller   Cooze       NaN    NaN
2          3  Wilkinson   Lewis  Jacobson    NaN
3          1   Jacobson     NaN       NaN    NaN
4          4        Ali  George     Lewis  Lewis

Option 2

d1 = df.groupby('score').agg({'score': 'size', 'player': lambda x: tuple(x)})
d1.join(pd.DataFrame(d1.pop('player').values.tolist()).add_prefix('col_'))

       score      col_0   col_1     col_2  col_3
score                                           
1          2     Miller   Cooze       NaN    NaN
2          3  Wilkinson   Lewis  Jacobson    NaN
3          1   Jacobson     NaN       NaN    NaN
4          4        Ali  George     Lewis  Lewis
like image 126
piRSquared Avatar answered Sep 21 '22 11:09

piRSquared