I have 3 different feature sets for a given set of audio files. Each of them are feature matrices stored as array of dimensions as follows:
Each of these features have been extracted from the audio files using different techniques.
What I would like to do is to train them together into a given classifier. (using the pipeline). I have read this, this and the blog link in the link 2 but it deals with different extraction methods and then using the classifiers. Since I already have the extracted data as mentioned above, I wanted to know what to do next i.e. how to combine them into getting a pipeline.
I know one cannot ask for direct code here - I just want pointers How to combine data(maybe using a pipeline) that is extracted from different methods to classify them using a SVM for example.
Assuming that you want to deal with set of features in independent models and then ensemble their results together, I'll write an answer below. However, if you want to simply use features from all 3 feature extraction techniques in a single model then just append them together into a single dataset and use it for training.
I think the easiest way to do this within a Pipeline
is to create a single (978*965) pandas DataFrame
that includes features from all three techniques. Then within your pipeline you can define a custom class that selects groups of features, for example this should work:
class VarSelect(BaseEstimator, TransformerMixin):
def __init__(self, keys):
self.keys = keys
def fit(self, x, y=None):
return self
def transform(self, df):
return df[self.keys].values
Then you will need a simple class that fits a model and then transforms it to provide predictions (needed to stack your models together). Something like this should work (depending on if your problem is regression or classification):
class ModelClassTransformer(BaseEstimator, TransformerMixin):
def __init__(self, model):
self.model = model
def fit(self, *args, **kwargs):
self.model.fit(*args, **kwargs)
return self
def transform(self, X, **transform_params):
return DataFrame(self.model.predict_proba(X))
class ModelRegTransformer(BaseEstimator, TransformerMixin):
def __init__(self, model):
self.model = model
def fit(self, *args, **kwargs):
self.model.fit(*args, **kwargs)
return self
def transform(self, X, **transform_params):
return DataFrame(self.model.predict(X))
Now that you have all of these things you can create a pipeline that trains individual models on subsets of your dataset and then stacks them together in a final ensembled model. An example pipeline using a bunch of SVMs (as you requested) could look like:
Pipeline([
('union', FeatureUnion([
('modelA', Pipeline([
('var', VarSelect(keys=vars_a)),
('scl', StandardScaler(copy=True, with_mean=True, with_std=True)),
('svm', ModelRegTransformer(SVC(kernel='rbf')))),
])),
('modelB', Pipeline([
('var', VarSelect(keys=vars_b)),
('scl', StandardScaler(copy=True, with_mean=True, with_std=True)),
('svm', ModelRegTransformer(SVC(kernel='rbf'))),
])),
('modelC', Pipeline([
('var', VarSelect(keys=vars_c)),
('scl', StandardScaler(copy=True, with_mean=True, with_std=True)),
('svm', ModelRegTransformer(SVC(kernel='rbf'))),
]))
])),
('scl', StandardScaler(copy=True, with_mean=True, with_std=True)),
('svm', SVC(kernel='rbf'))
])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With