I have a pd.DataFrame
which contains different dtypes columns. I would like to have the count of columns of each type. I use Pandas 0.24.2.
I tried:
dataframe.dtypes.value_counts()
It worked fine for other dtypes (float64, object, int64)
but for a weird reason, it doesn't aggregate the 'category' features, and I get a different count for each category (as if they would be counted as different values of dtypes).
I also tried:
dataframe.dtypes.groupby(by=dataframe.dtypes).agg(['count'])
But that raises a
TypeError: data type not understood.
Reproductible example:
import pandas as pd
df = pd.DataFrame([['A','a',1,10], ['B','b',2,20], ['C','c',3,30]], columns = ['col_1','col_2','col_3','col_4'])
df['col_1'] = df['col_1'].astype('category')
df['col_2'] = df['col_2'].astype('category')
print(df.dtypes.value_counts())
Expected result:
int64 2
category 2
dtype: int64
Actual result:
int64 2
category 1
category 1
dtype: int64
As part of exploring a new data, often you might want to count unique values of one or more columns in a dataframe. Pandas value_counts() can get counts of unique values of columns in a Pandas dataframe. Starting from Pandas version 1.1. 0, we can use value_counts() on a Pandas Series and dataframe as well.
When we have two categorical variables then each of them is likely to have different number of rows for the other variable. This helps us to understand the combinatorial values of those two categorical variables. We can find such type of rows using count function of dplyr package.
Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.
We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.
As @jezrael mentioned that it is deprecated in 0.25.0, dtypes.value_counts(0)
would give two category
ies, so to fix it do:
print(df.dtypes.astype(str).value_counts())
Output:
int64 2
category 2
dtype: int64
Use DataFrame.get_dtype_counts
:
print (df.get_dtype_counts())
category 2
int64 2
dtype: int64
But if use last version of pandas your solution is recommended:
Deprecated since version 0.25.0.
Use .dtypes.value_counts() instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With