Parameter oob_score_ in scikit-learn equals accuracy or error?

Tags:

I implemented Random Forest classifiers (RF) from Python scikit-learn package for a ML problem. In the first stage I used cross validation to spot check other algorithms and RF is now my choice.

Later on I also checked out what the OOB estimation of RF tells me. However, when I compare the return in 'oob_score_' with my results from CV I have a large discrepancy.

The scikit-learn doc tells me:

oob_score : bool

Whether to use out-of-bag samples to estimate the generalization error.

Because of the doc I was assuming that the Parameter 'oob_score_' is the error estimation. But looking for reasons it also came to my mind that it might actually estimate the accuracy instead This would be - at least a bit - closer to my CV results. I checked also the code, and more believe it's the accuracy but wanted to be sure... (in this case I find the doc misleading BTW).

Is oob_score_ in scikit-learn accuracy or error estimation?

514

asked Jul 15 '15 18:07

no_use123

1 Answers

It is an analogous of .score method, which returns accuracy of the model. It simply generalizes to to the oob scenario. Documentation is indeed a bit missleading.

As you may find in the code https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/forest.py

for k in range(self.n_outputs_):
            if (predictions[k].sum(axis=1) == 0).any():
                warn("Some inputs do not have OOB scores. "
                     "This probably means too few trees were used "
                     "to compute any reliable oob estimates.")

            decision = (predictions[k] /
                        predictions[k].sum(axis=1)[:, np.newaxis])
            oob_decision_function.append(decision)
            oob_score += np.mean(y[:, k] ==
                                 np.argmax(predictions[k], axis=1), axis=0)

It simply computes average of correct classifications.

105

answered Sep 21 '22 13:09

lejlot

Related questions
                            
                                Easy way to collapse trailing dimensions of numpy array?
                            
                                Regex: Match IP address except when preceded by certain characters?
                            
                                add value to each element in array python
                            
                                How to access hidden file upload field with Selenium WebDriver python
                            
                                How to test if a view is decorated with "login_required" (Django)
                            
                                Incrementing integer variable of global scope in Python [duplicate]
                            
                                writing and saving CSV file from scraping data using python and Beautifulsoup4
                            
                                ImportError: PyCapsule_Import could not import module "pyexpat"
                            
                                Load environment variables from a shell script
                            
                                Write to stdin of a running process in windows
                            
                                Is numpy.linalg.inv() giving the correct matrix inverse? EDIT: Why does inv() gives numerical errors?
                            
                                PyYAML yaml.dump() produces complex key for string key > 122 chars?
                            
                                NumPy or Dictionary?
                            
                                Install pyserial Mac OS 10.10?
                            
                                Geocoding using Geopy and Python
                            
                                DBF - encoding cp1250
                            
                                Why does Python's copy.copy() return a object not equal to the original?
                            
                                Speeding up Pandas apply function
                            
                                Integrating Selenium with Scrapy
                            
                                Binary integer programming with PULP using vector syntax for variables?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parameter oob_score_ in scikit-learn equals accuracy or error?

Tags:

python

python-2.7

scikit-learn

random-forest

no_use123

People also ask

1 Answers

lejlot

Recent Activity

Donate For Us