I have been doing the conversions manually, but is there a way to use bins or ranges with sklearn's labelencoder
:
le = LabelEncoder()
A = ["paris", "memphis"]
B = ["tokyo", "amsterdam"]
le.fit([A,B])
print(le.transform(["tokyo", "memphis", "paris","tokyo", "amsterdam"]))
desired output --> [2,1,1,2,2]
Or you could imagine using age ranges, distances, etc. Is there a way to do this?
As far as I know there is no way to do this with LabelEncoder, but making a custom transform function should work.
Edit: Updated code to deal with items that occur in both or none of the bins.
from sklearn.base import TransformerMixin
class BinnedLabelEncoder(TransformerMixin):
def transform(self, X, *_, start_index=1):
result = []
for item in X:
for group_id, group in enumerate(self.group_list):
if item in group:
result.append(group_id + start_index)
break
else:
result.append(None)
return result
def fit(self, group_list, *_):
self.group_list = group_list
return self
You can use this with the code from your question:
le = BinnedLabelEncoder()
A = ["paris", "memphis"]
B = ["tokyo", "amsterdam"]
le.fit([A,B])
print(le.transform(["tokyo", "memphis", "paris","tokyo", "amsterdam"]))
output
[2, 1, 1, 2, 2]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With