Scikit-learn's LabelBinarizer vs. OneHotEncoder

Tags:

What is the difference between the two? It seems that both create new columns, which their number is equal to the number of unique categories in the feature. Then they assign 0 and 1 to data points depending on what category they are in.

564

asked May 22 '18 17:05

Roozbeh Bakhshi

2 Answers

A simple example which encodes an array using LabelEncoder, OneHotEncoder, LabelBinarizer is shown below.

I see that OneHotEncoder needs data in integer encoded form first to convert into its respective encoding which is not required in the case of LabelBinarizer.

from numpy import array from sklearn.preprocessing import LabelEncoder from sklearn.preprocessing import OneHotEncoder from sklearn.preprocessing import LabelBinarizer  # define example data = ['cold', 'cold', 'warm', 'cold', 'hot', 'hot', 'warm', 'cold',  'warm', 'hot'] values = array(data) print "Data: ", values # integer encode label_encoder = LabelEncoder() integer_encoded = label_encoder.fit_transform(values) print "Label Encoder:" ,integer_encoded  # onehot encode onehot_encoder = OneHotEncoder(sparse=False) integer_encoded = integer_encoded.reshape(len(integer_encoded), 1) onehot_encoded = onehot_encoder.fit_transform(integer_encoded) print "OneHot Encoder:", onehot_encoded  #Binary encode lb = LabelBinarizer() print "Label Binarizer:", lb.fit_transform(values)

enter image description here

Another good link which explains the OneHotEncoder is: Explain onehotencoder using python

There may be other valid differences between the two which experts can probably explain.

answered Sep 22 '22 21:09

Rahul Pant

A difference is that you can use OneHotEncoder for multi column data, while not for LabelBinarizer and LabelEncoder.

from sklearn.preprocessing import LabelBinarizer, LabelEncoder, OneHotEncoder  X = [["US", "M"], ["UK", "M"], ["FR", "F"]] OneHotEncoder().fit_transform(X).toarray()  # array([[0., 0., 1., 0., 1.], #        [0., 1., 0., 0., 1.], #        [1., 0., 0., 1., 0.]])

LabelBinarizer().fit_transform(X) # ValueError: Multioutput target data is not supported with label binarization  LabelEncoder().fit_transform(X) # ValueError: bad input shape (3, 2)

answered Sep 24 '22 21:09

Kota Mori

Related questions
                            
                                How to parallelize list-comprehension calculations in Python?
                            
                                Using virtualenv with spaces in a path
                            
                                Python curve_fit with multiple independent variables
                            
                                How to configure vim to not put comments at the beginning of lines while editing python files
                            
                                how to kill (or avoid) zombie processes with subprocess module
                            
                                What happens when a module is imported twice?
                            
                                How to apply a function on every row on a dataframe?
                            
                                How to use type hints in python 3.6?
                            
                                Is it possible to get pip to print the configuration it is using?
                            
                                Is json.loads() vulnerable to arbitrary code execution?
                            
                                When is StringIO used, as opposed to joining a list of strings?
                            
                                Python NoneType object is not callable (beginner)
                            
                                Forcing application/json MIME type in a view (Flask)
                            
                                Get seconds since midnight in Python [closed]
                            
                                time.time vs. timeit.timeit
                            
                                How to encode bytes in JSON? json.dumps() throwing a TypeError
                            
                                Selenium versus BeautifulSoup for web scraping
                            
                                for or while loop to do something n times
                            
                                How to get the current Python interpreter path from inside a Python script? [duplicate]
                            
                                Should a return statement have parentheses?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scikit-learn's LabelBinarizer vs. OneHotEncoder

Tags:

python

encoding

scikit-learn

data-science

categorical-data

Roozbeh Bakhshi

People also ask

2 Answers

Rahul Pant

Kota Mori

Recent Activity

Donate For Us