Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python sklearn's labelencoder with categorical bins

I have been doing the conversions manually, but is there a way to use bins or ranges with sklearn's labelencoder:

le = LabelEncoder()
A = ["paris", "memphis"]
B = ["tokyo", "amsterdam"]
le.fit([A,B])
print(le.transform(["tokyo", "memphis", "paris","tokyo", "amsterdam"]))

desired output --> [2,1,1,2,2]

Or you could imagine using age ranges, distances, etc. Is there a way to do this?

like image 495
Joe Avatar asked Jul 17 '18 13:07

Joe


1 Answers

As far as I know there is no way to do this with LabelEncoder, but making a custom transform function should work.

Edit: Updated code to deal with items that occur in both or none of the bins.

from sklearn.base import TransformerMixin

class BinnedLabelEncoder(TransformerMixin):       

    def transform(self, X, *_, start_index=1):
        result = []
        for item in X:
            for group_id, group in enumerate(self.group_list):
                if item in group:
                    result.append(group_id + start_index)
                    break
            else:
                result.append(None)
        return result

    def fit(self, group_list, *_):
        self.group_list = group_list
        return self

You can use this with the code from your question:

le = BinnedLabelEncoder()
A = ["paris", "memphis"]
B = ["tokyo", "amsterdam"]
le.fit([A,B])
print(le.transform(["tokyo", "memphis", "paris","tokyo", "amsterdam"]))

output

[2, 1, 1, 2, 2]
like image 151
Louic Avatar answered Oct 22 '22 07:10

Louic