Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if dataframe column is Categorical

Tags:

python

pandas

I can't seem to get a simple dtype check working with Pandas' improved Categoricals in v0.15+. Basically I just want something like is_categorical(column) -> True/False.

import pandas as pd import numpy as np import random  df = pd.DataFrame({     'x': np.linspace(0, 50, 6),     'y': np.linspace(0, 20, 6),     'cat_column': random.sample('abcdef', 6) }) df['cat_column'] = pd.Categorical(df2['cat_column']) 

We can see that the dtype for the categorical column is 'category':

df.cat_column.dtype Out[20]: category 

And normally we can do a dtype check by just comparing to the name of the dtype:

df.x.dtype == 'float64' Out[21]: True 

But this doesn't seem to work when trying to check if the x column is categorical:

df.x.dtype == 'category' --------------------------------------------------------------------------- TypeError                                 Traceback (most recent call last) <ipython-input-22-94d2608815c4> in <module>() ----> 1 df.x.dtype == 'category'  TypeError: data type "category" not understood 

Is there any way to do these types of checks in pandas v0.15+?

like image 792
Marius Avatar asked Nov 14 '14 07:11

Marius


2 Answers

Use the name property to do the comparison instead, it should always work because it's just a string:

>>> import numpy as np >>> arr = np.array([1, 2, 3, 4]) >>> arr.dtype.name 'int64'  >>> import pandas as pd >>> cat = pd.Categorical(['a', 'b', 'c']) >>> cat.dtype.name 'category' 

So, to sum up, you can end up with a simple, straightforward function:

def is_categorical(array_like):     return array_like.dtype.name == 'category' 
like image 116
Jeff Tratner Avatar answered Oct 14 '22 22:10

Jeff Tratner


First, the string representation of the dtype is 'category' and not 'categorical', so this works:

In [41]: df.cat_column.dtype == 'category' Out[41]: True 

But indeed, as you noticed, this comparison gives a TypeError for other dtypes, so you would have to wrap it with a try .. except .. block.


Other ways to check using pandas internals:

In [42]: isinstance(df.cat_column.dtype, pd.api.types.CategoricalDtype) Out[42]: True  In [43]: pd.api.types.is_categorical_dtype(df.cat_column) Out[43]: True 

For non-categorical columns, those statements will return False instead of raising an error. For example:

In [44]: pd.api.types.is_categorical_dtype(df.x) Out[44]: False 

For much older version of pandas, replace pd.api.types in the above snippet with pd.core.common.

like image 21
joris Avatar answered Oct 14 '22 23:10

joris