Pandas Percentage count on a DataFrame groupby

Tags:

python

pandas

I have a DataFrame (mydf) along the lines of the following:

Index   Feature ID  Stuff1  Stuff2
1       True    1   23      12
2       True    1   54      12
3       False   0   45      67
4       True    0   38      29
5       False   1   32      24
6       False   1   59      39
7       True    0   37      32
8       False   0   76      65
9       False   1   32      12
10      True    0   23      15
..n     True    1   21      99

I am trying to calculate the True and False percentages of the Feature for each ID (0 or 1), and I am looking for two output for each ID:

Feature ID  Percent
True    1   20%
False   1   30%

Feature ID  Percent
True    0   30%
False   0   20%

I have tried a few attempts, but I start getting counts for all columns and then a percentage for all columns.

Here's my bad attempt:

percentageID0 = mydf[ mydf['ID']==0 ].set_index(['Feature']).count()
percentageID1 = mydf[ mydf['ID']==1 ].set_index(['Feature']).count()
fullcount = (mydf.groupby(['ID']).count()).sum()

print (percentageID0/fullcount) * 100
print (percentageID1/fullcount) * 100

Think I am getting mixed up with the groupby/index format.

283

asked Aug 20 '15 15:08

MikG

1 Answers

Could be just this:

In [73]:

print pd.DataFrame({'Percentage': df.groupby(('ID', 'Feature')).size() / len(df)})
            Percentage
ID Feature            
0  False           0.2
   True            0.3
1  False           0.3
   True            0.2

162

answered Sep 28 '22 09:09

CT Zhu

Related questions
                            
                                How does one ignore CSRF tokens sent to Django REST Framework?
                            
                                Python logging module emits wrong timezone information
                            
                                Calculating year over year growth by group in Pandas
                            
                                Get file path from askopenfilename function in Tkinter
                            
                                How to understand closure in a lambda?
                            
                                How do I create an "OR" filter using elasticsearch-dsl-py?
                            
                                Managing Celery Task Results
                            
                                Buildozer failed to execute the last command
                            
                                Get start and stop from a python slice object
                            
                                How to append selected columns to pandas dataframe from df with different columns
                            
                                Kafka-python get number of partitions for topic
                            
                                dynamic module does not define init function (PyInit_fuzzy)
                            
                                Flask-SQLAlchemy check if database server is responsive
                            
                                How to throw exception if script is run with Python 2?
                            
                                Pandas difference in index with date values
                            
                                Count number of rows when row contains certain text
                            
                                odoo - display name of many2one field combination of 2 fields
                            
                                SQLAlchemy query shows error "Can't join table/selectable 'workflows' to itself"
                            
                                Python module reference with Sphinx documentation
                            
                                Using annotate or extra to add field of foreignkey to queryset ? (equivalent of SQL "AS" ?)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With