Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Groupby: Count and mean combined

Working with pandas to try and summarise a data frame as a count of certain categories, as well as the means sentiment score for these categories.

There is a table full of strings that have different sentiment scores, and I want to group each text source by saying how many posts they have, as well as the average sentiment of these posts.

My (simplified) data frame looks like this:

source    text              sent -------------------------------- bar       some string       0.13 foo       alt string        -0.8 bar       another str       0.7 foo       some text         -0.2 foo       more text         -0.5 

The output from this should be something like this:

source    count     mean_sent ----------------------------- foo       3         -0.5 bar       2         0.415 

The answer is somewhere along the lines of:

df['sent'].groupby(df['source']).mean() 

Yet only gives each source and it's mean, with no column headers.

like image 502
Lewis Anderson Avatar asked Dec 08 '16 12:12

Lewis Anderson


People also ask

How do I count the number of rows in each group of a Groupby object?

You can use pandas DataFrame. groupby(). count() to group columns and compute the count or size aggregate, this calculates a rows count for each group combination.

How do you use Groupby and aggregate?

The Group By statement is used to group together any rows of a column with the same value stored in them, based on a function specified in the statement. Generally, these functions are one of the aggregate functions such as MAX() and SUM(). This statement is used with the SELECT command in SQL.

How do you get Groupby in pandas and sum?

groupby() involves a combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups such as sum(). Pandas dataframe. sum() function returns the sum of the values for the requested axis.


1 Answers

You can use groupby with aggregate:

df = df.groupby('source') \        .agg({'text':'size', 'sent':'mean'}) \        .rename(columns={'text':'count','sent':'mean_sent'}) \        .reset_index() print (df)   source  count  mean_sent 0    bar      2      0.415 1    foo      3     -0.500 
like image 121
jezrael Avatar answered Sep 22 '22 18:09

jezrael