Okay, I'm just going to say starting out that I'm entirely new to SciKit-Learn and data science. But here is the issue and my current research on the problem. Code at the bottom.
I'm trying to do type recognition (like digits, for example) with a BernoulliRBM and I'm trying to find the correct parameters with GridSearchCV. However I don't see anything going on. With a lot of examples using verbosity settings I see output and progress, but with mine it just says,
Fitting 3 folds for each of 15 candidates, totalling 45 fits
Then it sits there and does nothing....forever (or 8 hours, the longest I've waited with high verbosity settings).
I have a pretty large data set (1000 2D arrays each of size 428 by 428), so this might be the problem but I've also set the verbosity to 10 so I feel like I should be seeing some kind of output or progress. Also, in terms of my "target", it is just either a 0 or a 1, either it is the object I'm looking for (1), or it isn't (0).
I tampered with various parameters of gridsearchcv and tried creating fake (smaller) data sets to practice on.
def network_trainer(self, data, files):
train_x, test_x, train_y, test_y = train_test_split(data, files, test_size=0.2, random_state=0)
parameters = {'learning_rate':np.arange(.25, .75, .1), 'n_iter':[5, 10, 20]}
model = BernoulliRBM(random_state=0, verbose=True)
model.cv = 2
model.n_components = 2
logistic = linear_model.LogisticRegression()
pipeline = Pipeline(steps=[('model', model), ('clf', logistic)])
gscv = grid_search.GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=10)
gscv.fit(train_x, train_y)
print gscv.best_params_
I'd really appreciate a nudge in the right direction here. Thanks for considering my issue.
verbose: you can set it to 1 to get the detailed print out while you fit the data to GridSearchCV 6. n_jobs: number of processes you wish to run in parallel for this task if it -1 it will use all available processors. Now, let us see how to use GridSearchCV to improve the accuracy of our model.
Verbose is a general programming term for produce lots of logging output. You can think of it as asking the program to "tell me everything about what you are doing all the time". Just set it to true and see what happens.
Scoring: It is used as a evaluating metric for the model performance to decide the best hyperparameters, if not especified then it uses estimator score. cv : In this we have to pass a interger value, as it signifies the number of splits that is needed for cross validation. By default is set as five.
GridSearchCV implements a “fit” and a “score” method. It also implements “score_samples”, “predict”, “predict_proba”, “decision_function”, “transform” and “inverse_transform” if they are implemented in the estimator used.
Okay, so just to summarize everything I've figured out about it over the past few days.
Again I'd like to thank @Barmaley.exe for the initial tip.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With