Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras KerasClassifier gridsearch TypeError: can't pickle _thread.lock objects

The following code is throwing an error: TypeError: can't pickle _thread.lock objects

I can see that it likely has to do with passing the previous method in as a function in def fit(self, c_m). But I think this is correct via the documentations: https://keras.io/scikit-learn-api/

I may be making a rookie mistake if anyone sees the error in my code I would appreciate help.

np.random.seed(7)
y_dic = []

class NN:
    def __init__(self):
        self.X = None
        self.y = None
        self.model = None

    def clean_data(self):
        seed = 7
        np.random.seed(seed)
        dataset = pd.read_csv('/Users/isaac/pca_rfe_tsne_comparisons/Vital_intrusions.csv', delimiter=',', skiprows=0)
        dataset = dataset.iloc[:,1:6]
        self.X = dataset.iloc[:, 1:5]
        Y = dataset.iloc[:, 0]

        for y in Y:
            if y >= 8:
                y_dic.append(1)
            else:
                y_dic.append(0)
        self.y = y_dic

        self.X = np.asmatrix(stats.zscore(self.X, axis=0, ddof=1))
        self.y = to_categorical(self.y)


    def create_model(self):
        self.model = Sequential()
        self.model.add(Dense(4, input_dim=4, activation='relu'))
        self.model.add(Dense(4, activation='relu'))
        self.model.add(Dense(2, activation='sigmoid'))
        self.model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
        pass

    def fit(self, c_m):
        model = KerasClassifier(build_fn=c_m, verbose=0)
        batch_size = [10, 20, 40, 60, 80, 100]
        epochs = [10, 50, 100]
        param_grid = dict(batch_size=batch_size, epochs=epochs)
        grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
        pdb.set_trace()
        grid_result = grid.fit(self.X, self.y)
        return (grid_result)

    def results(self, grid_results):
        print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
        means = grid_result.cv_results_['mean_test_score']
        stds = grid_result.cv_results_['std_test_score']
        params = grid_result.cv_results_['params']
        for mean, stdev, param in zip(means, stds, params):
            print("%f (%f) with: %r" % (mean, stdev, param))


def main():
    nn = NN()
    nn.clean_data()
    nn.create_model()
    grid_results = nn.fit(nn.create_model)
    nn.results(grid_results)

if __name__ == "__main__":
    main()

Ok, a follow up to this. Thanks for your comments @MarcinMożejko. You were right about this. There were more errors I should mention. In def fit(), I wrote model = KerasClassifier, not self.model=Keras Classifier. I wanted to mention that incase anyone is looking at the code. I'm now getting a new error on the same line:

AttributeError: 'NoneType' object has no attribute 'loss'.

I can track this back to scikit_learn.py:

loss_name = self.model.loss
        if hasattr(loss_name, '__name__'):
            loss_name = loss_name.__name__
        if loss_name == 'categorical_crossentropy' and len(y.shape) != 2:
            y = to_categorical(y) 

I'm not sure how to solve this as I set the loss term in self.model.compile. I tried changing it to binary_crossentropy but that had no effect. any further thoughts?

like image 522
Isaac Avatar asked Feb 10 '18 05:02

Isaac


1 Answers

The problem lies in this line of code:

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)

Unfortunately - for now, keras is not supporting applying pickle to your model which is needed for sklearn to apply multiprocessing (here you may read the discussion on this). In order to make this code work you should set:

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
like image 150
Marcin Możejko Avatar answered Oct 13 '22 22:10

Marcin Możejko