Feature preprocessing of both continuous and categorical variables (of integer type) with scikit-learn

1 Answers

Check out the sklearn_pandas.DataFrameMapper meta-transformer. Use it as the first step in your pipeline to perform column-wise data engineering operations:

mapper = DataFrameMapper(
  [(continuous_col, StandardScaler()) for continuous_col in continuous_cols] +
  [(categorical_col, LabelBinarizer()) for categorical_col in categorical_cols]
)
pipeline = Pipeline(
  [("mapper", mapper),
  ("estimator", estimator)]
)
pipeline.fit_transform(df, df["y"])

Also, you should be using sklearn.preprocessing.LabelBinarizer instead of a list of [LabelEncoder(), OneHotEncoder()].

162

answered Oct 24 '22 22:10

user1808924

Related questions
                            
                                Logarithmic plot of a cumulative distribution function in matplotlib
                            
                                For Django Rest Framework, what is the difference in use case for HyperLinkedRelatedField and HyperLinkedIdentityField?
                            
                                How to create multiple workers in Python-RQ?
                            
                                Python-String to Bytes conversion. Double BackSlash issue
                            
                                Why is Anaconda source activate non-existent?
                            
                                How to change default path for "save the figure" in python?
                            
                                Return a download and rendered page in one Flask response
                            
                                Keras learning rate not changing despite decay in SGD
                            
                                ValueError: Attempted relative import in non-package not for tests package
                            
                                python gettext error: Can't convert '__proxy__' object to str implicitly
                            
                                Python, choose logging files' directory
                            
                                How can I get millisecond and microsecond-resolution timestamps in Python?
                            
                                How to refresh text in Matplotlib?
                            
                                Can I use functions imported from .py files in Dask/Distributed?
                            
                                coloring cells in excel with pandas
                            
                                How to store the result from %%timeit cell magic?
                            
                                Keras showing images from data generator
                            
                                randomly remove rows from dataframe based on condition
                            
                                Why does 000 evaluate to 0 in Python 3? [duplicate]
                            
                                What are the causes of overflow encountered in double_scalars besides division by zero?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Feature preprocessing of both continuous and categorical variables (of integer type) with scikit-learn

Tags:

python

pandas

machine-learning

scikit-learn

categorical-data

James Wong

People also ask

1 Answers

user1808924

Recent Activity

Donate For Us