Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Groupby count on multiple columns for specific string values only

Tags:

python

pandas

I have a data frame like this

dummy = pd.DataFrame([
('01/09/2020', 'TRUE', 'FALSE'),
('01/09/2020', 'TRUE', 'TRUE'),
('02/09/2020', 'FALSE', 'TRUE'),
('02/09/2020', 'TRUE', 'FALSE'),
('03/09/2020', 'FALSE', 'FALSE'),
('03/09/2020', 'TRUE', 'TRUE'),
('03/09/2020', 'TRUE', 'FALSE')], columns=['date', 'Action1', 'Action2'])

enter image description here

Now I want an aggregation of 'TRUE' action per day, which should look like
enter image description here

I applied group by, sum and count etc but nothing is working for me as it i have to aggegate multiple columns and I don't want to split the table into multiple dataframes and resolve it indivisually and merge into one, can someone please suggest any smart way to do it.

like image 692
Vineet Avatar asked Mar 24 '21 14:03

Vineet


People also ask

How do I get unique values from GROUP BY pandas?

To count unique values per groups in Python Pandas, we can use df. groupby('column_name'). count().

How do you count specific values in pandas?

We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.

How do you count after GROUP BY in pandas?

Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values.

Can you GROUP BY multiple columns in pandas?

Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.

How to count occurrences in column in pandas groupby?

Pandas GroupBy – Count occurrences in column 1 Import module 2 Create or import data frame 3 Apply groupby 4 Use any of the two methods 5 Display result More ...

How to group the rows in a Dataframe in Python?

Use pandas DataFrame.groupby () to group the rows by column and use count () method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well. The below example does the grouping on Courses column and calculates count how many times each value is present. Yields below output.

How do I Group and aggregate by multiple columns in pandas?

Pandas: How to Group and Aggregate by Multiple Columns Often you may want to group and aggregate by multiple columns of a pandas DataFrame. Fortunately this is easy to do using the pandas.groupby () and.agg () functions. This tutorial explains several examples of how to use these functions in practice.

How does groupby work in pandas Dataframe?

... Each iteration on the groupby object will return two values. The first value is the identifier of the group, which is the value for the column (s) on which they were grouped. The second value is the group itself, which is a Pandas DataFrame object.


3 Answers

True and False in your dummy df are strings, you can convert them to int and sum

dummy.replace({'TRUE':1,'FALSE':0}).groupby('date',as_index = False).sum()

    date        Action1 Action2
0   01/09/2020  2       1
1   02/09/2020  1       1
2   03/09/2020  2       1
like image 59
Vaishali Avatar answered Oct 22 '22 15:10

Vaishali


You can also try:

dummy.set_index(['date']).eq('TRUE').sum(level='date')

Output:

            Action1  Action2
date                        
01/09/2020        2        1
02/09/2020        1        1
03/09/2020        2        1
like image 20
Quang Hoang Avatar answered Oct 22 '22 17:10

Quang Hoang


Anyone seeing this answer should look at the answers by @QuangHoang or @Vaishali
They are much better answers. I can't control what the OP chooses, but you should go upvote those answers.

Inspired by @QuangHoang

dummy.iloc[:, 1:].eq('TRUE').groupby(dummy.date).sum()

            Action1  Action2
date                        
01/09/2020        2        1
02/09/2020        1        1
03/09/2020        2        1

OLD ANSWER

Fix your dataframe such that it has actual True/False values

from ast import literal_eval

dummy = dummy.assign(**dummy[['Action1', 'Action2']].applymap(str.title).applymap(literal_eval))

Then use groupby

dummy.groupby('date').sum()

            Action1  Action2
date                        
01/09/2020        2        1
02/09/2020        1        1
03/09/2020        2        1
like image 40
piRSquared Avatar answered Oct 22 '22 15:10

piRSquared