I try to use a loop to do some operations on the Pandas numeric and category columns.
df = sns.load_dataset('diamonds')
print(df.dtypes,'\n')
carat float64
cut category
color category
clarity category
depth float64
table float64
price int64
x float64
y float64
z float64
dtype: object
In the following codes, I just simply cut and paste 'float64' and 'category' from the preceding step output.
for i in df.columns:
if df[i].dtypes in ['float64']:
print(i)
for i in df.columns:
if df[i].dtypes in ['category']:
print(i)
I found that it works for 'float64' but generates an error for 'category'.
Why is this ? Thanks very much !!!
carat
depth
table
x
y
z
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-74-8e6aa9d4726e> in <module>
4
5 for i in df.columns:
----> 6 if df[i].dtypes in ['category']:
7 print(i)
TypeError: data type 'category' not understood
Try using pd.api.types.is_categorical_dtype
:
for i in df.columns:
if pd.api.types.is_categorical_dtype(df[i]):
print(i)
Or check the dtype
name:
for i in df.columns:
if df[i].dtype.name == 'category':
print(i)
Output:
cut
color
clarity
This is a bug in Pandas, here is the GitHub issue, one sentence is:
df.dtypes[colname] == 'category'
evaluates asTrue
for categorical columns and raisesTypeError: data type "category"
not understood fornp.float64
columns.
So actually, it works, it does give True
for categorical columns, but the problem here is that the numpy float64
dtype checking isn't cooperated with pandas dtypes, such as category
.
If you make order the columns differently, having the first 3 columns as categorical dtype columns, it will show those column names, but once float columns come, it will raise error due to numpy and pandas type issue:
>>> df = df.iloc[:, 1:]
>>> df
cut color clarity depth table price x y z
0 Ideal E SI2 61.5 55.0 326 3.95 3.98 2.43
1 Premium E SI1 59.8 61.0 326 3.89 3.84 2.31
2 Good E VS1 56.9 65.0 327 4.05 4.07 2.31
3 Premium I VS2 62.4 58.0 334 4.20 4.23 2.63
4 Good J SI2 63.3 58.0 335 4.34 4.35 2.75
... ... ... ... ... ... ... ... ... ...
53935 Ideal D SI1 60.8 57.0 2757 5.75 5.76 3.50
53936 Good D SI1 63.1 55.0 2757 5.69 5.75 3.61
53937 Very Good D SI1 62.8 60.0 2757 5.66 5.68 3.56
53938 Premium H SI2 61.0 58.0 2757 6.15 6.12 3.74
53939 Ideal D SI2 62.2 55.0 2757 5.83 5.87 3.64
[53940 rows x 9 columns]
>>> for i in df.columns:
if df[i].dtypes in ['category']:
print(i)
cut
color
clarity
Traceback (most recent call last):
File "<pyshell#138>", line 2, in <module>
if df[i].dtypes in ['category']:
TypeError: data type 'category' not understood
>>>
As you can see, it did output the columns, but once np.float64
dtyped columns appear, the numpy __eq__
magic method would throw an error from numpy backend.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With