<p>Okay, I'm just going to say starting out that I'm entirely new to SciKit-Learn and data science. But here is the issue and my current research on the problem. Code at the bottom.</p> <h3>Summary</h3> <p>I'm trying to do type recognition (like digits, for example) with a BernoulliRBM and I'm trying to find the correct parameters with GridSearchCV. However I don't see anything going on. With a lot of examples using verbosity settings I see output and progress, but with mine it just says,</p> <pre class="prettyprint"><code>Fitting 3 folds for each of 15 candidates, totalling 45 fits </code></pre> <p>Then it sits there and does nothing....forever (or 8 hours, the longest I've waited with high verbosity settings).</p> <p>I have a pretty large data set (1000 2D arrays each of size 428 by 428), so this might be the problem but I've also set the verbosity to 10 so I feel like I should be seeing some kind of output or progress. Also, in terms of my "target", it is just either a 0 or a 1, either it is the object I'm looking for (1), or it isn't (0).</p> <h3>Previous Research</h3> <ul> <li>I looked into sklearn.preprocessing to see if that was necessary, it doesn't seem to be the issue (but again, I'm entirely new to this).</li> <li>I increased verbosity</li> <li>I switched from using a 3D list of data to using a list of scipy csr matrices.</li> <li>I waited 8 hours with high verbosity settings, I still don't see anything happening.</li> <li>I switched from not using a pipeline, to using a pipeline</li> <li> <p>I tampered with various parameters of gridsearchcv and tried creating fake (smaller) data sets to practice on.</p> <pre class="prettyprint"><code>def network_trainer(self, data, files): train_x, test_x, train_y, test_y = train_test_split(data, files, test_size=0.2, random_state=0) parameters = {'learning_rate':np.arange(.25, .75, .1), 'n_iter':[5, 10, 20]} model = BernoulliRBM(random_state=0, verbose=True) model.cv = 2 model.n_components = 2 logistic = linear_model.LogisticRegression() pipeline = Pipeline(steps=[('model', model), ('clf', logistic)]) gscv = grid_search.GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=10) gscv.fit(train_x, train_y) print gscv.best_params_ </code></pre> </li> </ul> <p>I'd really appreciate a nudge in the right direction here. Thanks for considering my issue.</p>

<p>Okay, so just to summarize everything I've figured out about it over the past few days.</p> <ul> <li>On Windows 8.1 don't set n_jobs to anything other than 1 if you still want it to be verbose.</li> <li>In my case, even though I only have n_jobs = 1, all of my processor cores were still involved in the calculations, so either this is a bug or should be better documented.</li> <li>I made the horrible mistake of using a list of csr matrices, so basically, read the documentation and then read it again before you ask questions.</li> </ul> <p>Again I'd like to thank @Barmaley.exe for the initial tip.</p>

GridSearchCV no reporting on high verbosity

Summary

I'm trying to do type recognition (like digits, for example) with a BernoulliRBM and I'm trying to find the correct parameters with GridSearchCV. However I don't see anything going on. With a lot of examples using verbosity settings I see output and progress, but with mine it just says,

Fitting 3 folds for each of 15 candidates, totalling 45 fits

Then it sits there and does nothing....forever (or 8 hours, the longest I've waited with high verbosity settings).

I have a pretty large data set (1000 2D arrays each of size 428 by 428), so this might be the problem but I've also set the verbosity to 10 so I feel like I should be seeing some kind of output or progress. Also, in terms of my "target", it is just either a 0 or a 1, either it is the object I'm looking for (1), or it isn't (0).

Previous Research

I looked into sklearn.preprocessing to see if that was necessary, it doesn't seem to be the issue (but again, I'm entirely new to this).
I increased verbosity
I switched from using a 3D list of data to using a list of scipy csr matrices.
I waited 8 hours with high verbosity settings, I still don't see anything happening.
I switched from not using a pipeline, to using a pipeline

I tampered with various parameters of gridsearchcv and tried creating fake (smaller) data sets to practice on.

def network_trainer(self, data, files):
    train_x, test_x, train_y, test_y = train_test_split(data, files, test_size=0.2, random_state=0)

    parameters = {'learning_rate':np.arange(.25, .75, .1), 'n_iter':[5, 10, 20]}
    model = BernoulliRBM(random_state=0, verbose=True)
    model.cv = 2
    model.n_components = 2

    logistic = linear_model.LogisticRegression()
    pipeline = Pipeline(steps=[('model', model), ('clf', logistic)])

    gscv = grid_search.GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=10)
    gscv.fit(train_x, train_y)
    print gscv.best_params_

I'd really appreciate a nudge in the right direction here. Thanks for considering my issue.

529

asked Jan 17 '15 23:01

Isaac GS

1 Answers

Okay, so just to summarize everything I've figured out about it over the past few days.

On Windows 8.1 don't set n_jobs to anything other than 1 if you still want it to be verbose.
In my case, even though I only have n_jobs = 1, all of my processor cores were still involved in the calculations, so either this is a bug or should be better documented.
I made the horrible mistake of using a list of csr matrices, so basically, read the documentation and then read it again before you ask questions.

Again I'd like to thank @Barmaley.exe for the initial tip.

100

answered Oct 05 '22 11:10

Isaac GS

Related questions
                            
                                How to import and run a django function at the command line
                            
                                How to see logging output in embedded python interpreter?
                            
                                Use sqlalchemy to select only one row from related table
                            
                                Finding and substituting a list of words in a file using regex in Python
                            
                                Python mysql.connector cursor.lastrowid always returns 0
                            
                                cannot import name GoogleMaps
                            
                                how to use matplotlib's set_cmap()?
                            
                                Python scatter plot with colors corresponding to strings
                            
                                How to download a file with urllib3?
                            
                                Why python Wnck window.activate(int(time.time()))
                            
                                Django rest framework nested serializer partial update
                            
                                Example of "use \G in negative variable-length lookbehinds to limit how far back the lookbehind goes"
                            
                                Name of Design Pattern: get class from class level
                            
                                Python Open CV perspectiveTransform()
                            
                                How does kmeans know how to cluster documents when we only feed it tfidf vectors of individual words?
                            
                                Write a header at every logfile that is created with a time-rotating logger
                            
                                Removing first line of Big CSV file?
                            
                                save password as salted hash in mongodb in users collection using python/bcrypt
                            
                                Collecting messages from 3rd party apps in Django
                            
                                Converting pandas DatetimeIndex to 'float days format' with Matplotlib.dates.datestr2num

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

GridSearchCV no reporting on high verbosity

Tags:

python

machine-learning

scikit-learn

Summary

Previous Research

Isaac GS

People also ask

1 Answers

Isaac GS

Recent Activity

Donate For Us