While the dataframe.describe() is useful for some summary descriptive statistics - specifically quartiles and range values - it apparently does not have a cardinality count option?
What are the options - or alternatively approaches - for obtaining cardinality counts in the dataframe - potentially by supplying a list of columns and defaulting to "all" ?
You can count the distinct values of the dataframe series. This will give you the column name and the cardinality. For example for the dataframe:
names = pd.Categorical(['Tomba', 'Monica', 'Monica', 'Nancy', 'Neil', 'Chris'])
courses = pd.Categorical(['Physics', 'Geometry', 'Physics', 'Biology', 'Algebra', 'Algebra'])
df = pd.DataFrame({
'Name' : names,
'Course': courses
})
Out[72]: df
Course Name
0 Physics Tomba
1 Geometry Monica
2 Physics Monica
3 Biology Nancy
4 Algebra Neil
5 Algebra Chris
df.apply(pd.Series.nunique)
Course 4
Name 5
dtype: int64
Alternatively, you can use value_counts. Here is an example.
import pandas as pd
names = pd.Categorical(['Tomba', 'Monica', 'Monica', 'Nancy', 'Neil', 'Chris'])
courses = pd.Categorical(['Physics', 'Geometry', 'Physics', 'Biology', 'Algebra', 'Algebra'])
df = pd.DataFrame({'Name': names, 'Course': courses})
for col in df:
cardinality = len(pd.Index(df[col]).value_counts())
print(df[col].name + ": " + str(cardinality))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With