When trying to count rows with similar 'kind' in data frame:
import pandas as pd
items = [('aaa','aaa text 1'), ('aaa','aaa text 2'), ('aaa','aaa text 3'),
('bb', 'bb text 1'), ('bb', 'bb text 2'), ('bb', 'bb text 3'),
('bb', 'bb text 4'),
('cccc','cccc text 1'), ('cccc','cccc text 2'),
('dd', 'dd text 1'),
('e', 'e text 1'),
('fff', 'fff text 1'),
]
df = pd.DataFrame(items, columns=['kind', 'msg'])
df
kind msg
0 aaa aaa text 1
1 aaa aaa text 2
2 aaa aaa text 3
3 bb bb text 1
4 bb bb text 2
5 bb bb text 3
6 bb bb text 4
7 cccc cccc text 1
8 cccc cccc text 2
9 dd dd text 1
10 e e text 1
11 fff fff text 1
This code works:
df = df[['kind']].groupby(['kind'])['kind'] \
.count() \
.reset_index(name='count') \
.sort_values(['count'], ascending=False) \
.head(5)
df
Resulting in:
kind count
0 aaa 1
1 bb 1
2 cccc 1
3 dd 1
4 e 1
Yet, how can one get a data frame with all columns as in original one plus 'count' column? So the result should have columns 'kind', 'msg', 'count' in this order?
Also, how to sort this resulting data frame in descending order of count?
Groupby count in pandas dataframe python. Groupby count in pandas python can be accomplished by groupby () function. Groupby count of multiple column and single column in pandas is accomplished by multiple ways some among them are groupby () function and aggregate () function. let’s see how to. Groupby single column in pandas – groupby count.
Using aggregate() function: agg() function takes ‘count’ as input which performs groupby count, reset_index() assigns the new index to the grouped by dataframe and makes them a proper dataframe structure ''' Groupby multiple columns in pandas python using agg()''' df1.groupby(['State','Product'])['Sales'].agg('count').reset_index()
You can use pandas DataFrame.groupby ().count () to group columns and compute the count or size aggregate, this calculates a rows count for each group combination.
We will groupby count with “Product” and “State” columns along with the reset_index () will give a proper table structure , so the result will be agg () function takes ‘count’ as input which performs groupby count, reset_index () assigns the new index to the grouped by dataframe and makes them a proper dataframe structure
IIUC
In [247]: df['count'] = df.groupby('kind').transform('count')
In [248]: df
Out[248]:
kind msg count
0 aaa aaa text 1 3
1 aaa aaa text 2 3
2 aaa aaa text 3 3
3 bb bb text 1 4
4 bb bb text 2 4
5 bb bb text 3 4
6 bb bb text 4 4
7 cccc cccc text 1 2
8 cccc cccc text 2 2
9 dd dd text 1 1
10 e e text 1 1
11 fff fff text 1 1
sorting:
In [249]: df.sort_values('count', ascending=False)
Out[249]:
kind msg count
3 bb bb text 1 4
4 bb bb text 2 4
5 bb bb text 3 4
6 bb bb text 4 4
0 aaa aaa text 1 3
1 aaa aaa text 2 3
2 aaa aaa text 3 3
7 cccc cccc text 1 2
8 cccc cccc text 2 2
9 dd dd text 1 1
10 e e text 1 1
11 fff fff text 1 1
Here is the simple code to count the frequencies and add a column to the data frame when grouping by the kind
column.
df['count'] = df.groupby('kind')['kind'].transform('count')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With