Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GridSearchCV no reporting on high verbosity

Okay, I'm just going to say starting out that I'm entirely new to SciKit-Learn and data science. But here is the issue and my current research on the problem. Code at the bottom.

Summary

I'm trying to do type recognition (like digits, for example) with a BernoulliRBM and I'm trying to find the correct parameters with GridSearchCV. However I don't see anything going on. With a lot of examples using verbosity settings I see output and progress, but with mine it just says,

Fitting 3 folds for each of 15 candidates, totalling 45 fits

Then it sits there and does nothing....forever (or 8 hours, the longest I've waited with high verbosity settings).

I have a pretty large data set (1000 2D arrays each of size 428 by 428), so this might be the problem but I've also set the verbosity to 10 so I feel like I should be seeing some kind of output or progress. Also, in terms of my "target", it is just either a 0 or a 1, either it is the object I'm looking for (1), or it isn't (0).

Previous Research

  • I looked into sklearn.preprocessing to see if that was necessary, it doesn't seem to be the issue (but again, I'm entirely new to this).
  • I increased verbosity
  • I switched from using a 3D list of data to using a list of scipy csr matrices.
  • I waited 8 hours with high verbosity settings, I still don't see anything happening.
  • I switched from not using a pipeline, to using a pipeline
  • I tampered with various parameters of gridsearchcv and tried creating fake (smaller) data sets to practice on.

    def network_trainer(self, data, files):
        train_x, test_x, train_y, test_y = train_test_split(data, files, test_size=0.2, random_state=0)
    
        parameters = {'learning_rate':np.arange(.25, .75, .1), 'n_iter':[5, 10, 20]}
        model = BernoulliRBM(random_state=0, verbose=True)
        model.cv = 2
        model.n_components = 2
    
        logistic = linear_model.LogisticRegression()
        pipeline = Pipeline(steps=[('model', model), ('clf', logistic)])
    
        gscv = grid_search.GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=10)
        gscv.fit(train_x, train_y)
        print gscv.best_params_
    

I'd really appreciate a nudge in the right direction here. Thanks for considering my issue.

like image 529
Isaac GS Avatar asked Jan 17 '15 23:01

Isaac GS


People also ask

What does verbose do in GridSearchCV?

verbose: you can set it to 1 to get the detailed print out while you fit the data to GridSearchCV 6. n_jobs: number of processes you wish to run in parallel for this task if it -1 it will use all available processors. Now, let us see how to use GridSearchCV to improve the accuracy of our model.

What does verbose do in Sklearn?

Verbose is a general programming term for produce lots of logging output. You can think of it as asking the program to "tell me everything about what you are doing all the time". Just set it to true and see what happens.

What is the default scoring method for GridSearchCV?

Scoring: It is used as a evaluating metric for the model performance to decide the best hyperparameters, if not especified then it uses estimator score. cv : In this we have to pass a interger value, as it signifies the number of splits that is needed for cross validation. By default is set as five.

What are the scoring options in GridSearchCV?

GridSearchCV implements a “fit” and a “score” method. It also implements “score_samples”, “predict”, “predict_proba”, “decision_function”, “transform” and “inverse_transform” if they are implemented in the estimator used.


1 Answers

Okay, so just to summarize everything I've figured out about it over the past few days.

  • On Windows 8.1 don't set n_jobs to anything other than 1 if you still want it to be verbose.
  • In my case, even though I only have n_jobs = 1, all of my processor cores were still involved in the calculations, so either this is a bug or should be better documented.
  • I made the horrible mistake of using a list of csr matrices, so basically, read the documentation and then read it again before you ask questions.

Again I'd like to thank @Barmaley.exe for the initial tip.

like image 100
Isaac GS Avatar answered Oct 05 '22 11:10

Isaac GS