Cardinality / distinct count for all columns in pandas dataframe

Question

While the dataframe.describe() is useful for some summary descriptive statistics - specifically quartiles and range values - it apparently does not have a cardinality count option?

What are the options - or alternatively approaches - for obtaining cardinality counts in the dataframe - potentially by supplying a list of columns and defaulting to "all" ?

Nancy · Accepted Answer

You can count the distinct values of the dataframe series. This will give you the column name and the cardinality. For example for the dataframe:

names = pd.Categorical(['Tomba', 'Monica', 'Monica', 'Nancy', 'Neil', 'Chris'])
courses = pd.Categorical(['Physics', 'Geometry', 'Physics', 'Biology', 'Algebra', 'Algebra'])

df = pd.DataFrame({
    'Name' : names, 
    'Course': courses
})


Out[72]: df
     Course    Name
0   Physics   Tomba
1  Geometry  Monica
2   Physics  Monica
3   Biology   Nancy
4   Algebra    Neil
5   Algebra   Chris

df.apply(pd.Series.nunique)

Course    4
Name      5
dtype: int64

Zhongjun 'Mark' Jin · Answer

Alternatively, you can use value_counts. Here is an example.

import pandas as pd

names = pd.Categorical(['Tomba', 'Monica', 'Monica', 'Nancy', 'Neil', 'Chris'])
courses = pd.Categorical(['Physics', 'Geometry', 'Physics', 'Biology', 'Algebra', 'Algebra'])
df = pd.DataFrame({'Name': names, 'Course': courses})

for col in df:
    cardinality = len(pd.Index(df[col]).value_counts())
    print(df[col].name + ": " + str(cardinality))

Cardinality / distinct count for all columns in pandas dataframe

Tags:

python

pandas

WestCoastProjects

2 Answers

Nancy

Zhongjun 'Mark' Jin

Recent Activity

Donate For Us

Cardinality / distinct count for all columns in pandas dataframe

Tags:

python

pandas

WestCoastProjects

2 Answers

Nancy

Zhongjun 'Mark' Jin

Related questions

Recent Activity

Donate For Us