Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count the number of categorical features with Pandas?

I have a pd.DataFrame which contains different dtypes columns. I would like to have the count of columns of each type. I use Pandas 0.24.2.

I tried:

    dataframe.dtypes.value_counts()

It worked fine for other dtypes (float64, object, int64) but for a weird reason, it doesn't aggregate the 'category' features, and I get a different count for each category (as if they would be counted as different values of dtypes).

I also tried:

    dataframe.dtypes.groupby(by=dataframe.dtypes).agg(['count'])

But that raises a

TypeError: data type not understood.

Reproductible example:

import pandas as pd

df = pd.DataFrame([['A','a',1,10], ['B','b',2,20], ['C','c',3,30]], columns = ['col_1','col_2','col_3','col_4'])

df['col_1'] = df['col_1'].astype('category')
df['col_2'] = df['col_2'].astype('category')

print(df.dtypes.value_counts())

Expected result:

    int64       2
    category    2
    dtype: int64

Actual result:

    int64       2
    category    1
    category    1
    dtype: int64
like image 644
Krystof Avatar asked Jul 26 '19 05:07

Krystof


People also ask

How do you count categorical values in pandas?

As part of exploring a new data, often you might want to count unique values of one or more columns in a dataframe. Pandas value_counts() can get counts of unique values of columns in a Pandas dataframe. Starting from Pandas version 1.1. 0, we can use value_counts() on a Pandas Series and dataframe as well.

How do you count the number of records for a categorical variable?

When we have two categorical variables then each of them is likely to have different number of rows for the other variable. This helps us to understand the combinatorial values of those two categorical variables. We can find such type of rows using count function of dplyr package.

How do you count the number of occurrences in pandas?

Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.

How do I count the number of items in a column in pandas?

We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.


2 Answers

As @jezrael mentioned that it is deprecated in 0.25.0, dtypes.value_counts(0) would give two categoryies, so to fix it do:

print(df.dtypes.astype(str).value_counts())

Output:

int64       2
category    2
dtype: int64
like image 36
U12-Forward Avatar answered Oct 01 '22 17:10

U12-Forward


Use DataFrame.get_dtype_counts:

print (df.get_dtype_counts())
category    2
int64       2
dtype: int64

But if use last version of pandas your solution is recommended:

Deprecated since version 0.25.0.

Use .dtypes.value_counts() instead.

like image 169
jezrael Avatar answered Oct 01 '22 17:10

jezrael