I noticed that that <code>r2_score</code> and <code>explained_variance_score</code> are both build-in <code>sklearn.metrics</code> methods for regression problems. I was always under the impression that <code>r2_score</code> is the percent variance explained by the model. How is it different from <code>explained_variance_score</code>? When would you choose one over the other? Thanks!

Most of the answers I found (including here) emphasize on the difference between R2 and Explained Variance Score, that is: The Mean Residue (i.e. The Mean of Error). However, there is an important question left behind, that is: Why on earth I need to consider The Mean of Error? <hr> Refresher: R2: is the Coefficient of Determination which measures the amount of variation explained by the (least-squares) Linear Regression. You can look at it from a different angle for the purpose of evaluating the predicted values of <code>y</code> like this: Varianceactual_y × R2actual_y = Variancepredicted_y So intuitively, the more R2 is closer to <code>1</code>, the more actual_y and predicted_y will have same variance (i.e. same spread) <hr> As previously mentioned, the main difference is the Mean of Error; and if we look at the formulas, we find that's true: <pre class="prettyprint"><code>R2 = 1 - [(Sum of Squared Residuals / n) / Variancey_actual] Explained Variance Score = 1 - [Variance(Ypredicted - Yactual) / Variancey_actual] </code></pre> in which: <pre class="prettyprint"><code>Variance(Ypredicted - Yactual) = (Sum of Squared Residuals - Mean Error) / n </code></pre> So, obviously the only difference is that we are subtracting the Mean Error from the first formula! ... But Why? <hr> When we compare the R2 Score with the Explained Variance Score, we are basically checking the Mean Error; so if R2 = Explained Variance Score, that means: The Mean Error = Zero! The Mean Error reflects the tendency of our estimator, that is: the Biased v.s Unbiased Estimation. <hr> <h3>In Summary:</h3> If you want to have unbiased estimator so our model is not underestimating or overestimating, you may consider taking Mean of Error into account.

Python sci-kit learn (metrics): difference between r2_score and explained_variance_score?

2 Answers

Most of the answers I found (including here) emphasize on the difference between R² and Explained Variance Score, that is: The Mean Residue (i.e. The Mean of Error).

However, there is an important question left behind, that is: Why on earth I need to consider The Mean of Error?

Refresher:

R²: is the Coefficient of Determination which measures the amount of variation explained by the (least-squares) Linear Regression.

You can look at it from a different angle for the purpose of evaluating the predicted values of y like this:

Variance_{actual_y} × R²_{actual_y} = Variance_{predicted_y}

So intuitively, the more R² is closer to 1, the more actual_y and predicted_y will have same variance (i.e. same spread)

As previously mentioned, the main difference is the Mean of Error; and if we look at the formulas, we find that's true:

R² = 1 - [(Sum of Squared Residuals / n) / Variance_{y_actual}]

Explained Variance Score = 1 - [Variance_{(Y_predicted - Y_actual)} / Variance_{y_actual}]

in which:

Variance(Y_predicted - Y_actual) = (Sum of Squared Residuals - Mean Error) / n

So, obviously the only difference is that we are subtracting the Mean Error from the first formula! ... But Why?

When we compare the R² Score with the Explained Variance Score, we are basically checking the Mean Error; so if R² = Explained Variance Score, that means: The Mean Error = Zero!

The Mean Error reflects the tendency of our estimator, that is: the Biased v.s Unbiased Estimation.

In Summary:

If you want to have unbiased estimator so our model is not underestimating or overestimating, you may consider taking Mean of Error into account.

111

answered Oct 12 '22 15:10

Yahya

OK, look at this example:

In [123]:
#data
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.957173447537
0.948608137045
In [124]:
#what explained_variance_score really is
1-np.cov(np.array(y_true)-np.array(y_pred))/np.cov(y_true)
Out[124]:
0.95717344753747324
In [125]:
#what r^2 really is
1-((np.array(y_true)-np.array(y_pred))**2).sum()/(4*np.array(y_true).std()**2)
Out[125]:
0.94860813704496794
In [126]:
#Notice that the mean residue is not 0
(np.array(y_true)-np.array(y_pred)).mean()
Out[126]:
-0.25
In [127]:
#if the predicted values are different, such that the mean residue IS 0:
y_pred=[2.5, 0.0, 2, 7]
(np.array(y_true)-np.array(y_pred)).mean()
Out[127]:
0.0
In [128]:
#They become the same stuff
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.982869379015
0.982869379015

So, when the mean residue is 0, they are the same. Which one to choose dependents on your needs, that is, is the mean residue suppose to be 0?

answered Oct 12 '22 15:10

CT Zhu

Related questions
                            
                                How to access List elements
                            
                                How to launch EC2 instance with Boto, specifying size of EBS?
                            
                                itertools.accumulate() versus functools.reduce()
                            
                                How to show multiple images in one figure?
                            
                                matplotlib hatched fill_between without edges?
                            
                                Python modules with submodules and functions
                            
                                Limiting/throttling the rate of HTTP requests in GRequests
                            
                                Why does Python handle '1 is 1**2' differently from '1000 is 10**3'?
                            
                                python - RGB matrix of an image
                            
                                Downloading a file from google cloud storage inside a folder
                            
                                How to get default blue colour of matplotlib.pyplot.scatter?
                            
                                What is the default weight initializer in Keras?
                            
                                How to hash a large object (dataset) in Python?
                            
                                When will Django support Python 3.x?
                            
                                How to convert a string from CP-1251 to UTF-8?
                            
                                Exception in Thread:must be a sequence, not instance
                            
                                How to check if value is nan in unittest?
                            
                                how to discriminate based on HTTP method in django urlpatterns
                            
                                How exactly does addStretch work in QBoxLayout?
                            
                                pygame installation issue in mac os

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python sci-kit learn (metrics): difference between r2_score and explained_variance_score?

Tags:

python

scikit-learn

regression

monkeybiz7

People also ask

2 Answers

In Summary:

Yahya

CT Zhu

Recent Activity

Donate For Us