I am learning machine learning using Titanic dataset from Kaggle. I am using LabelEncoder of sklearn to transform text data to numeric labels. The following code works fine for "Sex" but not for "Embarked". <pre class="prettyprint"><code>encoder = preprocessing.LabelEncoder() features["Sex"] = encoder.fit_transform(features["Sex"]) features["Embarked"] = encoder.fit_transform(features["Embarked"]) </code></pre> This is the error I got <pre class="prettyprint"><code>Traceback (most recent call last): File "../src/script.py", line 20, in <module> features["Embarked"] = encoder.fit_transform(features["Embarked"]) File "/opt/conda/lib/python3.6/site-packages/sklearn/preprocessing/label.py", line 131, in fit_transform self.classes_, y = np.unique(y, return_inverse=True) File "/opt/conda/lib/python3.6/site-packages/numpy/lib/arraysetops.py", line 211, in unique perm = ar.argsort(kind='mergesort' if return_index else 'quicksort') TypeError: '>' not supported between instances of 'str' and 'float' </code></pre> <img src="https://i.stack.imgur.com/OrdB0.png" alt="Description of dataset">

Try this function, you’ll need to pass a Pandas Dataframe. It will look at the type of your column and encode. So you won’t need to even bother checking the types yourself. <pre class="prettyprint"><code>def encoder(data): '''Map the categorical variables to numbers to work with scikit learn''' for col in data.columns: if data.dtypes[col] == "object": le = preprocessing.LabelEncoder() le.fit(data[col]) data[col] = le.transform(data[col]) return data </code></pre>

Sklearn LabelEncoder throws TypeError in sort

Tags:

machine-learning

scikit-learn

sklearn-pandas

I am learning machine learning using Titanic dataset from Kaggle. I am using LabelEncoder of sklearn to transform text data to numeric labels. The following code works fine for "Sex" but not for "Embarked".

encoder = preprocessing.LabelEncoder()
features["Sex"] = encoder.fit_transform(features["Sex"])
features["Embarked"] = encoder.fit_transform(features["Embarked"])

This is the error I got

Traceback (most recent call last):
  File "../src/script.py", line 20, in <module>
    features["Embarked"] = encoder.fit_transform(features["Embarked"])
  File "/opt/conda/lib/python3.6/site-packages/sklearn/preprocessing/label.py", line 131, in fit_transform
    self.classes_, y = np.unique(y, return_inverse=True)
  File "/opt/conda/lib/python3.6/site-packages/numpy/lib/arraysetops.py", line 211, in unique
    perm = ar.argsort(kind='mergesort' if return_index else 'quicksort')
TypeError: '>' not supported between instances of 'str' and 'float'

Description of dataset

944

asked May 13 '17 18:05

Bhavani Ravi

2 Answers

I solved it myself. The problem was that the particular feature had NaN values. Replacing it with a numerical value it will still throw an error since it is of different datatypes. So I replaced it with a character value

 features["Embarked"] = encoder.fit_transform(features["Embarked"].fillna('0'))

160

answered Sep 23 '22 20:09

Bhavani Ravi

Try this function, you’ll need to pass a Pandas Dataframe. It will look at the type of your column and encode. So you won’t need to even bother checking the types yourself.

def encoder(data):
'''Map the categorical variables to numbers to work with scikit learn'''
for col in data.columns:
    if data.dtypes[col] == "object":
        le = preprocessing.LabelEncoder()
        le.fit(data[col])
        data[col] = le.transform(data[col])
return data

answered Sep 21 '22 20:09

Giovanni Bruner

Related questions
                            
                                How to update Tensorflow on mac?
                            
                                All intermediate steps should be transformers and implement fit and transform
                            
                                What does that code snippet signify "tf.logging.set_verbosity(tf.logging.INFO)" in tensorflow code?
                            
                                How to use KBinsDiscretizer to make continuous data into bins in Sklearn?
                            
                                Predicting Values with k-Means Clustering Algorithm
                            
                                Dictionary object to decision tree in Pydot
                            
                                Problematic Random Forest training runtime when using formula interface
                            
                                How to get a hash code as integer in R?
                            
                                sklearn LinearSVC - X has 1 features per sample; expecting 5
                            
                                '::hypot' has not been declared
                            
                                Matplotlib.colors.ListedColormap in python
                            
                                Adding a new Instance in weka
                            
                                What's the difference between data time major and batch major?
                            
                                Why doesn't my simple pytorch network work on GPU device?
                            
                                Return predictions wav2vec fairseq
                            
                                What is the difference between K-means clustering and vector quantization?
                            
                                max_df corresponds to documents than min_df error in Ridge classifier
                            
                                Why does binary accuracy give high accuracy while categorical accuracy give low accuracy, in a multi-class classification problem?
                            
                                What is the negative mean absolute error in scikit-learn?
                            
                                GradientBoostingClassifier with a BaseEstimator in scikit-learn?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Sklearn LabelEncoder throws TypeError in sort

Tags:

machine-learning

scikit-learn

sklearn-pandas

Bhavani Ravi

People also ask

2 Answers

Bhavani Ravi

Giovanni Bruner

Recent Activity

Donate For Us