I have a <code>pd.DataFrame</code> which contains different dtypes columns. I would like to have the count of columns of each type. I use Pandas 0.24.2. I tried: <pre class="prettyprint"><code> dataframe.dtypes.value_counts() </code></pre> It worked fine for other <code>dtypes (float64, object, int64)</code> but for a weird reason, it doesn't aggregate the 'category' features, and I get a different count for each category (as if they would be counted as different values of dtypes). I also tried: <pre class="prettyprint"><code> dataframe.dtypes.groupby(by=dataframe.dtypes).agg(['count']) </code></pre> But that raises a <blockquote> TypeError: data type not understood. </blockquote> Reproductible example: <pre class="prettyprint"><code>import pandas as pd df = pd.DataFrame([['A','a',1,10], ['B','b',2,20], ['C','c',3,30]], columns = ['col_1','col_2','col_3','col_4']) df['col_1'] = df['col_1'].astype('category') df['col_2'] = df['col_2'].astype('category') print(df.dtypes.value_counts()) </code></pre> Expected result: <pre class="prettyprint"><code> int64 2 category 2 dtype: int64 </code></pre> Actual result: <pre class="prettyprint"><code> int64 2 category 1 category 1 dtype: int64 </code></pre>

As @jezrael mentioned that it is deprecated in 0.25.0, <code>dtypes.value_counts(0)</code> would give two <code>category</code>ies, so to fix it do: <pre class="prettyprint"><code>print(df.dtypes.astype(str).value_counts()) </code></pre> Output: <pre class="prettyprint"><code>int64 2 category 2 dtype: int64 </code></pre>

Use <code>DataFrame.get_dtype_counts</code>: <pre class="prettyprint"><code>print (df.get_dtype_counts()) category 2 int64 2 dtype: int64 </code></pre> But if use last version of pandas your solution is recommended: <blockquote> Deprecated since version 0.25.0. Use .dtypes.value_counts() instead. </blockquote>

How to count the number of categorical features with Pandas?

Tags:

pandas

categorical-data

dtype

I have a pd.DataFrame which contains different dtypes columns. I would like to have the count of columns of each type. I use Pandas 0.24.2.

I tried:

    dataframe.dtypes.value_counts()

It worked fine for other dtypes (float64, object, int64) but for a weird reason, it doesn't aggregate the 'category' features, and I get a different count for each category (as if they would be counted as different values of dtypes).

I also tried:

    dataframe.dtypes.groupby(by=dataframe.dtypes).agg(['count'])

But that raises a

TypeError: data type not understood.

Reproductible example:

import pandas as pd

df = pd.DataFrame([['A','a',1,10], ['B','b',2,20], ['C','c',3,30]], columns = ['col_1','col_2','col_3','col_4'])

df['col_1'] = df['col_1'].astype('category')
df['col_2'] = df['col_2'].astype('category')

print(df.dtypes.value_counts())

Expected result:

    int64       2
    category    2
    dtype: int64

Actual result:

    int64       2
    category    1
    category    1
    dtype: int64

644

asked Jul 26 '19 05:07

Krystof

2 Answers

As @jezrael mentioned that it is deprecated in 0.25.0, dtypes.value_counts(0) would give two categoryies, so to fix it do:

print(df.dtypes.astype(str).value_counts())

Output:

int64       2
category    2
dtype: int64

answered Oct 01 '22 17:10

U12-Forward

Use DataFrame.get_dtype_counts:

print (df.get_dtype_counts())
category    2
int64       2
dtype: int64

But if use last version of pandas your solution is recommended:

Deprecated since version 0.25.0.

Use .dtypes.value_counts() instead.

169

answered Oct 01 '22 17:10

jezrael

Related questions
                            
                                Pandas side-by-side stacked bar plot
                            
                                How to append a row to another dataframe
                            
                                Anonymize specific columns with pii in pandas dataframe python
                            
                                Filtering with MultiIndex
                            
                                Calculate nunique() for groupby in pandas
                            
                                Pandas group by one column concatenate values of other column as delimited list
                            
                                Python Pandas: get rows of a DataFrame where a column is not null
                            
                                Python dataframe: Finding a value in same row as a defined value in a different column
                            
                                How to find a columns set for a primary key candidate in CSV file?
                            
                                applying lambda row on multiple columns pandas
                            
                                How to get dict of first two indexes for multi index data frame
                            
                                Extract string if match the value in another list
                            
                                How to plot time as x axis in pandas
                            
                                Python: how to replace NaN with conditions in a dataframe?
                            
                                Pandas Join on String Datatype
                            
                                Replacing nan with blanks in Python
                            
                                How to specify a random seed while using Python's numpy random choice?
                            
                                Pandas - Replacing Values by Looking Up in an Another Dataframe
                            
                                Finding NaN Values in Pandas MultiIndex
                            
                                How to keep columns based on a given row values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With