Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

feature Union of hetereogenous features

I have 3 different feature sets for a given set of audio files. Each of them are feature matrices stored as array of dimensions as follows:

  • feature 1: (978*153)
  • feature 2: (978*800)
  • feature 3: (978*12)

Each of these features have been extracted from the audio files using different techniques.

What I would like to do is to train them together into a given classifier. (using the pipeline). I have read this, this and the blog link in the link 2 but it deals with different extraction methods and then using the classifiers. Since I already have the extracted data as mentioned above, I wanted to know what to do next i.e. how to combine them into getting a pipeline.

I know one cannot ask for direct code here - I just want pointers How to combine data(maybe using a pipeline) that is extracted from different methods to classify them using a SVM for example.

like image 913
Vyas Avatar asked Oct 04 '15 13:10

Vyas


1 Answers

Assuming that you want to deal with set of features in independent models and then ensemble their results together, I'll write an answer below. However, if you want to simply use features from all 3 feature extraction techniques in a single model then just append them together into a single dataset and use it for training.

I think the easiest way to do this within a Pipeline is to create a single (978*965) pandas DataFrame that includes features from all three techniques. Then within your pipeline you can define a custom class that selects groups of features, for example this should work:

class VarSelect(BaseEstimator, TransformerMixin):
    def __init__(self, keys):
        self.keys = keys
    def fit(self, x, y=None):
        return self
    def transform(self, df):
        return df[self.keys].values

Then you will need a simple class that fits a model and then transforms it to provide predictions (needed to stack your models together). Something like this should work (depending on if your problem is regression or classification):

class ModelClassTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, model):
        self.model = model
    def fit(self, *args, **kwargs):
        self.model.fit(*args, **kwargs)
        return self
    def transform(self, X, **transform_params):
        return DataFrame(self.model.predict_proba(X))

class ModelRegTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, model):
        self.model = model
    def fit(self, *args, **kwargs):
        self.model.fit(*args, **kwargs)
        return self
    def transform(self, X, **transform_params):
        return DataFrame(self.model.predict(X))

Now that you have all of these things you can create a pipeline that trains individual models on subsets of your dataset and then stacks them together in a final ensembled model. An example pipeline using a bunch of SVMs (as you requested) could look like:

Pipeline([
    ('union', FeatureUnion([
        ('modelA', Pipeline([
            ('var', VarSelect(keys=vars_a)),
            ('scl', StandardScaler(copy=True, with_mean=True, with_std=True)),
            ('svm', ModelRegTransformer(SVC(kernel='rbf')))),
        ])),
        ('modelB', Pipeline([
            ('var', VarSelect(keys=vars_b)),
            ('scl', StandardScaler(copy=True, with_mean=True, with_std=True)),
            ('svm', ModelRegTransformer(SVC(kernel='rbf'))),
        ])),
        ('modelC', Pipeline([
            ('var', VarSelect(keys=vars_c)),
            ('scl', StandardScaler(copy=True, with_mean=True, with_std=True)),
            ('svm', ModelRegTransformer(SVC(kernel='rbf'))),
        ]))
    ])),
    ('scl', StandardScaler(copy=True, with_mean=True, with_std=True)),
    ('svm', SVC(kernel='rbf'))
])
like image 91
David Avatar answered Nov 15 '22 05:11

David