When using LabelEncoder
to encode categorical variables into numerics,
how does one keep a dictionary in which the transformation is tracked?
i.e. a dictionary in which I can see which values became what:
{'A':1,'B':2,'C':3}
LabelEncoder can be used to normalize labels. It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels. Fit label encoder. Fit label encoder and return encoded labels.
As you can see, we have three new columns with 1s and 0s, depending on the country that the rows represent. So, that's the difference between Label Encoding and One Hot Encoding.
To reverse the process of LabelEncoder , it has a function provided specifically for the task called inverse_transform.
I created a dictionary from classes_
le = preprocessing.LabelEncoder()
ids = le.fit_transform(labels)
mapping = dict(zip(le.classes_, range(len(le.classes_))))
to test:
all([mapping[x] for x in le.inverse_transform(ids)] == ids)
should return True
.
This works because fit_transform
uses numpy.unique
to simultaneously calculate the label encoding and the classes_
attribute:
def fit_transform(self, y):
self.classes_, y = np.unique(y, return_inverse=True)
return y
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With