Pandas Groupby count on multiple columns for specific string values only

Tags:

pandas

I have a data frame like this

dummy = pd.DataFrame([
('01/09/2020', 'TRUE', 'FALSE'),
('01/09/2020', 'TRUE', 'TRUE'),
('02/09/2020', 'FALSE', 'TRUE'),
('02/09/2020', 'TRUE', 'FALSE'),
('03/09/2020', 'FALSE', 'FALSE'),
('03/09/2020', 'TRUE', 'TRUE'),
('03/09/2020', 'TRUE', 'FALSE')], columns=['date', 'Action1', 'Action2'])

enter image description here

Now I want an aggregation of 'TRUE' action per day, which should look like
enter image description here

I applied group by, sum and count etc but nothing is working for me as it i have to aggegate multiple columns and I don't want to split the table into multiple dataframes and resolve it indivisually and merge into one, can someone please suggest any smart way to do it.

692

asked Mar 24 '21 14:03

Vineet

3 Answers

True and False in your dummy df are strings, you can convert them to int and sum

dummy.replace({'TRUE':1,'FALSE':0}).groupby('date',as_index = False).sum()

    date        Action1 Action2
0   01/09/2020  2       1
1   02/09/2020  1       1
2   03/09/2020  2       1

answered Oct 22 '22 15:10

Vaishali

You can also try:

dummy.set_index(['date']).eq('TRUE').sum(level='date')

Output:

            Action1  Action2
date                        
01/09/2020        2        1
02/09/2020        1        1
03/09/2020        2        1

answered Oct 22 '22 17:10

Quang Hoang

Anyone seeing this answer should look at the answers by @QuangHoang or @Vaishali
They are much better answers. I can't control what the OP chooses, but you should go upvote those answers.

Inspired by @QuangHoang

dummy.iloc[:, 1:].eq('TRUE').groupby(dummy.date).sum()

            Action1  Action2
date                        
01/09/2020        2        1
02/09/2020        1        1
03/09/2020        2        1

OLD ANSWER

Fix your dataframe such that it has actual True/False values

from ast import literal_eval

dummy = dummy.assign(**dummy[['Action1', 'Action2']].applymap(str.title).applymap(literal_eval))

Then use groupby

dummy.groupby('date').sum()

            Action1  Action2
date                        
01/09/2020        2        1
02/09/2020        1        1
03/09/2020        2        1

answered Oct 22 '22 15:10

piRSquared

Related questions
                            
                                pandas get data for the end day of month?
                            
                                .NET Core 2.0 & Angular Initial app build fails - Can't find python followed by JavaScript Runtime Error
                            
                                OSError: [Errno 8] Exec format error: 'geckodriver' when trying to open firefox using selenium in python
                            
                                How to resize / rescale a SVG graphic in an iPython / Jupyter Notebook?
                            
                                Why doesn't my simple pytorch network work on GPU device?
                            
                                Split Time Series pySpark data frame into test & train without using random split
                            
                                Callback on variable change in python
                            
                                Paramiko SSH failing with "Server '...' not found in known_hosts" when run on web server
                            
                                Running a python script and changing git branch
                            
                                VS Code not finding pytest tests
                            
                                How to capture the stdout/stderr of a unittest in a variable?
                            
                                Time complexity of numpy.transpose
                            
                                Error installing uwsgi with pip: "Python.h no such file". python-dev and python3-dev packages are installed
                            
                                iPython gives error for unexpected keyword argument 'inputhook'
                            
                                When CPython set `in` operator is O(n)?
                            
                                Match and remove duplicated characters: Replace multiple (3+) non-consecutive occurrences
                            
                                Can't use SIFT in Python OpenCV v4.20
                            
                                How to add column delimiter to Pandas dataframe display
                            
                                Return predictions wav2vec fairseq
                            
                                How to prune weights less than a threshold in PyTorch?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With