Guided by this answer I started to build up pipe for processing columns of dataframe based on its dtype. But after getting some unexpected output and some debugging i ended up with test dataframe and test dtype checking:
# Creating test dataframe
test = pd.DataFrame({'bool' :[False, True], 'int':[-1,2],'float': [-2.5, 3.4],
'compl':np.array([1-1j, 5]),
'dt' :[pd.Timestamp('2013-01-02'), pd.Timestamp('2016-10-20')],
'td' :[pd.Timestamp('2012-03-02')- pd.Timestamp('2016-10-20'),
pd.Timestamp('2010-07-12')- pd.Timestamp('2000-11-10')],
'prd' :[pd.Period('2002-03','D'), pd.Period('2012-02-01', 'D')],
'intrv':pd.arrays.IntervalArray([pd.Interval(0, 0.1), pd.Interval(1, 5)]),
'str' :['s1', 's2'],
'cat' :[1, -1],
'obj' :[[1,2,3], [5435,35,-52,14]]
})
test['cat'] = test['cat'].astype('category')
test
test.dtypes
# Testing types
types = list(test.columns)
df_types = pd.DataFrame(np.zeros((len(types),len(types)), dtype=bool),
index = ['is_'+el for el in types],
columns = types)
for col in test.columns:
df_types.at['is_bool', col] = pd.api.types.is_bool_dtype(test[col])
df_types.at['is_int' , col] = pd.api.types.is_integer_dtype(test[col])
df_types.at['is_float',col] = pd.api.types.is_float_dtype(test[col])
df_types.at['is_compl',col] = pd.api.types.is_complex_dtype(test[col])
df_types.at['is_dt' , col] = pd.api.types.is_datetime64_dtype(test[col])
df_types.at['is_td' , col] = pd.api.types.is_timedelta64_dtype(test[col])
df_types.at['is_prd' , col] = pd.api.types.is_period_dtype(test[col])
df_types.at['is_intrv',col] = pd.api.types.is_interval_dtype(test[col])
df_types.at['is_str' , col] = pd.api.types.is_string_dtype(test[col])
df_types.at['is_cat' , col] = pd.api.types.is_categorical_dtype(test[col])
df_types.at['is_obj' , col] = pd.api.types.is_object_dtype(test[col])
# Styling func
def coloring(df):
clr_g = 'color : green'
clr_r = 'color : red'
mask = ~np.logical_xor(df.values, np.eye(df.shape[0], dtype=bool))
# OUTPUT
return pd.DataFrame(np.where(mask, clr_g, clr_r),
index = df.index,
columns = df.columns)
# OUTPUT colored
df_types.style.apply(coloring, axis=None)
OUTPUT:

bool bool
int int64
float float64
compl complex128
dt datetime64[ns]
td timedelta64[ns]
prd period[D]
intrv interval[float64]
str object
cat category
obj object

Almost everything is good, but this test code produces two questions:
pd.api.types.is_string_dtype fires
on category dtype. Why is that? Should it be treated as 'expected'
behavior? is_string_dtype and is_object_dtype fires on each
other? This is a bit expected, because even in .dtypes both types
are noted as object, but it would be better if someone clarify it
step by step.P.s.: Bonus question - am i right when thinking that pandas has its internal tests that should be passed when building new release (like df_types from test code, but not with 'coloring in red' rather 'recording info about errors')?
EDIT: pandas version 0.24.2.
This comes down to is_string_dtype being a fairly loose check, with the implementation even having a TODO note to make it more strict, linking to Issue #15585.
The reason this check is not strict is because there isn't a dedicated string dtype in pandas, and instead strings are just stored with object dtype, which could really store anything. As such, a more strict check would likely introduce a performance overhead.
To answer your questions:
This is a result of CategoricalDtype.kind being set to 'O', which is one of the loose checks is_string_dtype does. This could probably change in the future given the TODO note, so it's not something I'd rely upon.
Since strings are stored as object dtype it makes sense for is_object_dtype to fire on strings, and I'd consider this behavior to be reliable as the implementation will almost certainly not change in the immediate future. The reverse is true due to the reliance on dtype.kind in is_string_dtype, which has the same caveats as with categoricals described above.
Yes, pandas has a test suite that will run automatically on various CI services for every PR that's created. The test suite includes checks similar to what you're doing.
One tangentially related note to add: there is a library called fletcher that uses Apache Arrow to implement a more native string type in a way that's compatible with pandas. It's still under development and probably doesn't currently have support for all the string operations that pandas does.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With