I'm using sklearn.pipeline.Pipeline
to chain feature extractors and a classifier. Is there a way to combine multiple feature selection classes (for example the ones from sklearn.feature_selection.text
) in parallel and join their output?
My code right now looks as follows:
pipeline = Pipeline([
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', SGDClassifier())])
It results in the following:
vect -> tfidf -> clf
I want to be able to specify a pipeline that looks as follows:
vect1 -> tfidf1 \
-> clf
vect2 -> tfidf2 /
Autoencoders, wavelet scattering, and deep neural networks are commonly used to extract features and reduce dimensionality of the data.
The sklearn. feature_extraction module can be used to extract features in a format supported by machine learning algorithms from datasets consisting of formats such as text and image.
Principal Component Analysis (PCA) and Independent Component Analysis (ICA) were the two best methods at extracting representative features, followed by Dictionary Learning (DL) and Non-Negative Matrix Factorization (NNMF).
The main difference:- Feature Extraction transforms an arbitrary data, such as text or images, into numerical features that is understood by machine learning algorithms. Feature Selection on the other hand is a machine learning technique applied on these (numerical) features.
This has been implemented recently in the master branch of scikit-learn under the name FeatureUnion
:
http://scikit-learn.org/dev/modules/pipeline.html#feature-union
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With