Working with pandas to try and summarise a data frame as a count of certain categories, as well as the means sentiment score for these categories.
There is a table full of strings that have different sentiment scores, and I want to group each text source by saying how many posts they have, as well as the average sentiment of these posts.
My (simplified) data frame looks like this:
source text sent -------------------------------- bar some string 0.13 foo alt string -0.8 bar another str 0.7 foo some text -0.2 foo more text -0.5
The output from this should be something like this:
source count mean_sent ----------------------------- foo 3 -0.5 bar 2 0.415
The answer is somewhere along the lines of:
df['sent'].groupby(df['source']).mean()
Yet only gives each source and it's mean, with no column headers.
You can use pandas DataFrame. groupby(). count() to group columns and compute the count or size aggregate, this calculates a rows count for each group combination.
The Group By statement is used to group together any rows of a column with the same value stored in them, based on a function specified in the statement. Generally, these functions are one of the aggregate functions such as MAX() and SUM(). This statement is used with the SELECT command in SQL.
groupby() involves a combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups such as sum(). Pandas dataframe. sum() function returns the sum of the values for the requested axis.
You can use groupby
with aggregate
:
df = df.groupby('source') \ .agg({'text':'size', 'sent':'mean'}) \ .rename(columns={'text':'count','sent':'mean_sent'}) \ .reset_index() print (df) source count mean_sent 0 bar 2 0.415 1 foo 3 -0.500
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With