I'm facing this error for multiple variables even treating missing values. For example:
le = preprocessing.LabelEncoder() categorical = list(df.select_dtypes(include=['object']).columns.values) for cat in categorical: print(cat) df[cat].fillna('UNK', inplace=True) df[cat] = le.fit_transform(df[cat]) # print(le.classes_) # print(le.transform(le.classes_)) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-24-424a0952f9d0> in <module>() 4 print(cat) 5 df[cat].fillna('UNK', inplace=True) ----> 6 df[cat] = le.fit_transform(df[cat].fillna('UNK')) 7 # print(le.classes_) 8 # print(le.transform(le.classes_)) C:\Users\paula.ceccon.ribeiro\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py in fit_transform(self, y) 129 y = column_or_1d(y, warn=True) 130 _check_numpy_unicode_bug(y) --> 131 self.classes_, y = np.unique(y, return_inverse=True) 132 return y 133 C:\Users\paula.ceccon.ribeiro\AppData\Local\Continuum\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py in unique(ar, return_index, return_inverse, return_counts) 209 210 if optional_indices: --> 211 perm = ar.argsort(kind='mergesort' if return_index else 'quicksort') 212 aux = ar[perm] 213 else: TypeError: '>' not supported between instances of 'float' and 'str'
Checking the variable that lead to the error results ins:
df['CRM do Médico'].isnull().sum() 0
Besides nan values, what could be causing this error?
The Python "TypeError: '>' not supported between instances of 'float' and 'str'" occurs when we use a comparison operator between values of type float and str . To solve the error, convert the string to a float before comparing, e.g. my_float > float(my_str) .
The Python "TypeError: '<' not supported between instances of 'str' and 'int'" occurs when we use a comparison operator between values of type str and int . To solve the error, convert the string to an integer before comparing, e.g. int(my_str) < my_int .
This is due to the series df[cat]
containing elements that have varying data types e.g.(strings and/or floats). This could be due to the way the data is read, i.e. numbers are read as float and text as strings or the datatype was float and changed after the fillna
operation.
In other words
pandas data type 'Object' indicates mixed types rather than str type
so using the following line:
df[cat] = le.fit_transform(df[cat].astype(str))
should help
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With