How to use the output from OneHotEncoder in sklearn?

Tags:

I have a Pandas Dataframe with 2 categorical variables, and ID variable and a target variable (for classification). I managed to convert the categorical values with OneHotEncoder. This results in a sparse matrix.

ohe = OneHotEncoder()
# First I remapped the string values in the categorical variables to integers as OneHotEncoder needs integers as input
... remapping code ...

ohe.fit(df[['col_a', 'col_b']])
ohe.transform(df[['col_a', 'col_b']])

But I have no clue how I can use this sparse matrix in a DecisionTreeClassifier? Especially when I want to add some other non-categorical variables in my dataframe later on. Thanks!

EDIT In reply to the comment of miraculixx: I also tried the DataFrameMapper in sklearn-pandas

mapper = DataFrameMapper([
    ('id_col', None),
    ('target_col', None),
    (['col_a'], OneHotEncoder()),
    (['col_b'], OneHotEncoder())
])

t = mapper.fit_transform(df)

But then I get this error:

TypeError: no supported conversion for types : (dtype('O'), dtype('int64'), dtype('float64'), dtype('float64')).

803

asked Jul 21 '16 21:07

Bert Carremans

2 Answers

I see you are already using Pandas, so why not using its get_dummies function?

import pandas as pd
df = pd.DataFrame([['rick','young'],['phil','old'],['john','teenager']],columns=['name','age-group'])

result

   name age-group
0  rick     young
1  phil       old
2  john  teenager

now you encode with get_dummies

pd.get_dummies(df)

result

name_john  name_phil  name_rick  age-group_old  age-group_teenager  \
0          0          0          1              0                   0   
1          0          1          0              1                   0   
2          1          0          0              0                   1   

   age-group_young  
0                1  
1                0  
2                0

And you can actually use the new Pandas DataFrame in your Sklearn's DecisionTreeClassifier.

128

answered Sep 21 '22 17:09

Guiem Bosch

Look at this example from scikit-learn: http://scikit-learn.org/stable/auto_examples/ensemble/plot_feature_transformation.html#example-ensemble-plot-feature-transformation-py

Problem is that you are not using the sparse matrices to xx.fit(). You are using the original data.

answered Sep 19 '22 17:09

Merlin

Related questions
                            
                                Streaming DroidCam video to OpenCV Python in ANYWAY possible
                            
                                Accessing total_seconds() in pandas data column
                            
                                Python Xpath: lxml.etree.XPathEvalError: Invalid predicate
                            
                                Why is Jupyter Notebook creating duplicate plots when making updating plots
                            
                                Is it possible to know if two python functions are functionally equivalent?
                            
                                How to override method of the logging module
                            
                                How to get VirtualEnv TensorFlow to work in PyCharm?
                            
                                Pairing bluetooth devices with Passkey/Password in python - RFCOMM (Linux)
                            
                                what does C-contiguous fashion mean in caffe blob storage?
                            
                                Matplotlib: How to make a histogram with bins of equal area?
                            
                                Using Scrapy Itemloader in a loop
                            
                                conda env create failed?
                            
                                Using multiple NOT IN statements with Python
                            
                                Sampling rate issue with Librosa
                            
                                asyncio + multiprocessing + unix
                            
                                Colon, None, slice(None) in numpy array indexers
                            
                                How to make conda virtual environments persistent and available for tools such as Jupyter Notebook?
                            
                                What does this overflow error in python mean?
                            
                                in python, how do you denote required parameters and optional parameters in code?
                            
                                Keras BFGS training using Scipy minimize

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use the output from OneHotEncoder in sklearn?

Tags:

python

pandas

classification

one-hot-encoding

scikit-learn

Bert Carremans

People also ask

2 Answers

Guiem Bosch

Merlin

Recent Activity

Donate For Us