Multi label encoding for classes with duplicates

Tags:

How can I n-hot encode a column of lists with duplicates?

Something like MultiLabelBinarizer from sklearn which counts the number of instances of duplicate classes instead of binarizing.

Example input:

x = pd.Series([['a', 'b', 'a'], ['b', 'c'], ['c','c']])

Expected output:

    a   b   c
0   2   1   0
1   0   1   1
2   0   0   2

399

asked Aug 06 '19 07:08

brandoldperson

1 Answers

I have written a new class MultiLabelCounter based on the MultiLabelBinarizer code.

import itertools
import numpy as np

class MultiLabelCounter():
    def __init__(self, classes=None):
        self.classes_ = classes

    def fit(self,y):
        self.classes_ = sorted(set(itertools.chain.from_iterable(y)))
        self.mapping = dict(zip(self.classes_,
                                         range(len(self.classes_))))
        return self

    def transform(self,y):
        yt = []
        for labels in y:
            data = [0]*len(self.classes_)
            for label in labels:
                data[self.mapping[label]] +=1
            yt.append(data)
        return yt

    def fit_transform(self,y):
        return self.fit(y).transform(y)

import pandas as pd
x = pd.Series([['a', 'b', 'a'], ['b', 'c'], ['c','c']])

mlc = MultiLabelCounter()
mlc.fit_transform(x)

# [[2, 1, 0], [0, 1, 1], [0, 0, 2]]

answered Sep 30 '22 20:09

Venkatachalam

Related questions
                            
                                Why I'm getting Swift_TransportException: Unable to connect with TLS encryption in Server?
                            
                                How to vectorize increments in Python
                            
                                VSCode - open remote from cli
                            
                                Can I control the formatting of multiline strings?
                            
                                Handling event streams in haskell
                            
                                Is it possible to create a Bluetooth Mesh Network with iOS and Android devices
                            
                                use plugin International Telephone Input in react
                            
                                gRPC client failing to connect to server with TLS certificates
                            
                                gcc size_t and sizeof arithmetic conversion to int
                            
                                AWS Codedeploy BlockTraffic/AllowTraffic durations
                            
                                Behat hangs when there are multiple scenarios, but works on a single one
                            
                                Azure pipeline using matrix environment variable in step condition

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With