How to get value counts for multiple columns at once in Pandas DataFrame?

Tags:

Given a Pandas DataFrame that has multiple columns with categorical values (0 or 1), is it possible to conveniently get the value_counts for every column at the same time?

For example, suppose I generate a DataFrame as follows:

import numpy as np import pandas as pd np.random.seed(0) df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd'))

I can get a DataFrame like this:

   a  b  c  d 0  0  1  1  0 1  1  1  1  1 2  1  1  1  0 3  0  1  0  0 4  0  0  0  1 5  0  1  1  0 6  0  1  1  1 7  1  0  1  0 8  1  0  1  1 9  0  1  1  0

How do I conveniently get the value counts for every column and obtain the following conveniently?

   a  b  c  d 0  6  3  2  6 1  4  7  8  4

My current solution is:

pieces = [] for col in df.columns:     tmp_series = df[col].value_counts()     tmp_series.name = col     pieces.append(tmp_series) df_value_counts = pd.concat(pieces, axis=1)

But there must be a simpler way, like stacking, pivoting, or groupby?

216

asked Sep 15 '15 15:09

Xin

2 Answers

Just call apply and pass pd.Series.value_counts:

In [212]: df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd')) df.apply(pd.Series.value_counts) Out[212]:    a  b  c  d 0  4  6  4  3 1  6  4  6  7

165

answered Sep 18 '22 18:09

EdChum

There is actually a fairly interesting and advanced way of doing this problem with crosstab and melt

df = pd.DataFrame({'a': ['table', 'chair', 'chair', 'lamp', 'bed'],                    'b': ['lamp', 'candle', 'chair', 'lamp', 'bed'],                    'c': ['mirror', 'mirror', 'mirror', 'mirror', 'mirror']})  df         a       b       c 0  table    lamp  mirror 1  chair  candle  mirror 2  chair   chair  mirror 3   lamp    lamp  mirror 4    bed     bed  mirror

We can first melt the DataFrame

df1 = df.melt(var_name='columns', value_name='index') df1     columns   index 0        a   table 1        a   chair 2        a   chair 3        a    lamp 4        a     bed 5        b    lamp 6        b  candle 7        b   chair 8        b    lamp 9        b     bed 10       c  mirror 11       c  mirror 12       c  mirror 13       c  mirror 14       c  mirror

And then use the crosstab function to count the values for each column. This preserves the data type as ints which wouldn't be the case for the currently selected answer:

pd.crosstab(index=df1['index'], columns=df1['columns'])  columns  a  b  c index            bed      1  1  0 candle   0  1  0 chair    2  1  0 lamp     1  2  0 mirror   0  0  5 table    1  0  0

Or in one line, which expands the column names to parameter names with ** (this is advanced)

pd.crosstab(**df.melt(var_name='columns', value_name='index'))

Also, value_counts is now a top-level function. So you can simplify the currently selected answer to the following:

df.apply(pd.value_counts)

answered Sep 18 '22 18:09

Ted Petrou

Related questions
                            
                                Python Accessing Nested JSON Data [duplicate]
                            
                                How do I read two lines from a file at a time using python
                            
                                Module function vs staticmethod vs classmethod vs no decorators: Which idiom is more pythonic?
                            
                                Variable's memory size in Python [duplicate]
                            
                                Type of compiled regex object in python
                            
                                Why are Python exceptions named "Error"?
                            
                                append dictionary to data frame
                            
                                Django ModelForm for Many-to-Many fields
                            
                                Python Module Import: Single-line vs Multi-line
                            
                                How can numpy be so much faster than my Fortran routine?
                            
                                Determine prefix from a set of (similar) strings
                            
                                How can I periodically execute a function with asyncio?
                            
                                Importing data from a MySQL database into a Pandas data frame including column names [duplicate]
                            
                                What is the correct way to set Python's locale on Windows?
                            
                                How to exit the entire application from a Python thread?
                            
                                How to build and fill pandas dataframe from for loop? [duplicate]
                            
                                How to link PyCharm with PySpark?
                            
                                How to use PIL to make all white pixels transparent?
                            
                                Prepend line to beginning of a file
                            
                                How to close a SQLAlchemy session?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get value counts for multiple columns at once in Pandas DataFrame?

Tags:

python

pandas

numpy

Xin

People also ask

2 Answers

EdChum

Ted Petrou

Recent Activity

Donate For Us