I can't seem to get a simple dtype check working with Pandas' improved Categoricals in v0.15+. Basically I just want something like is_categorical(column) -> True/False
.
import pandas as pd import numpy as np import random df = pd.DataFrame({ 'x': np.linspace(0, 50, 6), 'y': np.linspace(0, 20, 6), 'cat_column': random.sample('abcdef', 6) }) df['cat_column'] = pd.Categorical(df2['cat_column'])
We can see that the dtype
for the categorical column is 'category':
df.cat_column.dtype Out[20]: category
And normally we can do a dtype check by just comparing to the name of the dtype:
df.x.dtype == 'float64' Out[21]: True
But this doesn't seem to work when trying to check if the x
column is categorical:
df.x.dtype == 'category' --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-22-94d2608815c4> in <module>() ----> 1 df.x.dtype == 'category' TypeError: data type "category" not understood
Is there any way to do these types of checks in pandas v0.15+?
Use the name
property to do the comparison instead, it should always work because it's just a string:
>>> import numpy as np >>> arr = np.array([1, 2, 3, 4]) >>> arr.dtype.name 'int64' >>> import pandas as pd >>> cat = pd.Categorical(['a', 'b', 'c']) >>> cat.dtype.name 'category'
So, to sum up, you can end up with a simple, straightforward function:
def is_categorical(array_like): return array_like.dtype.name == 'category'
First, the string representation of the dtype is 'category'
and not 'categorical'
, so this works:
In [41]: df.cat_column.dtype == 'category' Out[41]: True
But indeed, as you noticed, this comparison gives a TypeError
for other dtypes, so you would have to wrap it with a try .. except ..
block.
Other ways to check using pandas internals:
In [42]: isinstance(df.cat_column.dtype, pd.api.types.CategoricalDtype) Out[42]: True In [43]: pd.api.types.is_categorical_dtype(df.cat_column) Out[43]: True
For non-categorical columns, those statements will return False
instead of raising an error. For example:
In [44]: pd.api.types.is_categorical_dtype(df.x) Out[44]: False
For much older version of pandas
, replace pd.api.types
in the above snippet with pd.core.common
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With