Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining feature extraction classes in scikit-learn

I'm using sklearn.pipeline.Pipeline to chain feature extractors and a classifier. Is there a way to combine multiple feature selection classes (for example the ones from sklearn.feature_selection.text) in parallel and join their output?

My code right now looks as follows:

pipeline = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', SGDClassifier())])

It results in the following:

vect -> tfidf -> clf

I want to be able to specify a pipeline that looks as follows:

vect1 -> tfidf1 \
                 -> clf
vect2 -> tfidf2 /
like image 664
Daniel Avatar asked Oct 04 '12 06:10

Daniel


People also ask

What are the three types of feature extraction methods?

Autoencoders, wavelet scattering, and deep neural networks are commonly used to extract features and reduce dimensionality of the data.

What is Sklearn Feature_extraction?

The sklearn. feature_extraction module can be used to extract features in a format supported by machine learning algorithms from datasets consisting of formats such as text and image.

Which method is best for feature extraction?

Principal Component Analysis (PCA) and Independent Component Analysis (ICA) were the two best methods at extracting representative features, followed by Dictionary Learning (DL) and Non-Negative Matrix Factorization (NNMF).

What is the difference between feature selection and feature extraction?

The main difference:- Feature Extraction transforms an arbitrary data, such as text or images, into numerical features that is understood by machine learning algorithms. Feature Selection on the other hand is a machine learning technique applied on these (numerical) features.


1 Answers

This has been implemented recently in the master branch of scikit-learn under the name FeatureUnion:

http://scikit-learn.org/dev/modules/pipeline.html#feature-union

like image 59
ogrisel Avatar answered Oct 12 '22 10:10

ogrisel