Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas - Counting occurrences of a value in a DataFrame per each unique value in another column

Supposing that I have a DataFrame along the lines of:

    term      score
0   this          0
1   that          1
2   the other     3
3   something     2
4   anything      1
5   the other     2
6   that          2
7   this          0
8   something     1

How would I go about counting up the instances in the score column by unique values in the term column? Producing a result like:

    term      score 0     score 1     score 2     score 3
0   this            2           0           0           0
1   that            0           1           1           0
2   the other       0           0           1           1
3   something       0           1           1           0
4   anything        0           1           0           0

Related questions I've read here include Python Pandas counting and summing specific conditions and COUNTIF in pandas python over multiple columns with multiple conditions, but neither seems to quite be what I'm looking to do. pivot_table as mentioned at this question seems like it could be relevant but I'm impeded by lack of experience and the brevity of the pandas documentation. Thanks for any suggestions.

like image 383
Scott Martin Avatar asked Dec 10 '22 05:12

Scott Martin


2 Answers

You can also use, get_dummies, set_index, and sum with level parameter:

(pd.get_dummies(df.set_index('term'), columns=['score'], prefix_sep=' ')
   .sum(level=0)
   .reset_index())

Output:

        term  score 0  score 1  score 2  score 3
0       this        2        0        0        0
1       that        0        1        1        0
2  the other        0        0        1        1
3  something        0        1        1        0
4   anything        0        1        0        0
like image 152
Scott Boston Avatar answered Dec 12 '22 17:12

Scott Boston


Use groupby with size and reshape by unstack, last add_prefix:

df = df.groupby(['term','score']).size().unstack(fill_value=0).add_prefix('score ')

Or use crosstab:

df = pd.crosstab(df['term'],df['score']).add_prefix('score ')

Or pivot_table:

df = (df.pivot_table(index='term',columns='score', aggfunc='size', fill_value=0)
        .add_prefix('score '))

print (df)
score      score 0  score 1  score 2  score 3
term                                         
anything         0        1        0        0
something        0        1        1        0
that             0        1        1        0
the other        0        0        1        1
this             2        0        0        0
like image 30
jezrael Avatar answered Dec 12 '22 18:12

jezrael