How can one use <code>cross_val_score</code> for regression? The default scoring seems to be accuracy, which is not very meaningful for regression. Supposedly I would like to use mean squared error, is it possible to specify that in <code>cross_val_score</code>? Tried the following two but doesn't work: <pre class="prettyprint"><code>scores = cross_validation.cross_val_score(svr, diabetes.data, diabetes.target, cv=5, scoring='mean_squared_error') </code></pre> and <pre class="prettyprint"><code>scores = cross_validation.cross_val_score(svr, diabetes.data, diabetes.target, cv=5, scoring=metrics.mean_squared_error) </code></pre> The first one generates a list of negative numbers while mean squared error should always be non-negative. The second one complains that: <pre class="prettyprint"><code>mean_squared_error() takes exactly 2 arguments (3 given) </code></pre>

I dont have the reputation to comment but I want to provide this link for you and/or a passersby where the negative output of the MSE in scikit learn is discussed - https://github.com/scikit-learn/scikit-learn/issues/2439 In addition (to make this a real answer) your first option is correct in that not only is MSE the metric you want to use to compare models but R^2 cannot be calculated depending (I think) on the type of cross-val you are using. If you choose MSE as a scorer, it outputs a list of errors which you can then take the mean of, like so: <pre class="prettyprint"><code># Doing linear regression with leave one out cross val from sklearn import cross_validation, linear_model import numpy as np # Including this to remind you that it is necessary to use numpy arrays rather # than lists otherwise you will get an error X_digits = np.array(x) Y_digits = np.array(y) loo = cross_validation.LeaveOneOut(len(Y_digits)) regr = linear_model.LinearRegression() scores = cross_validation.cross_val_score(regr, X_digits, Y_digits, scoring='mean_squared_error', cv=loo,) # This will print the mean of the list of errors that were output and # provide your metric for evaluation print scores.mean() </code></pre>

Scikit-learn cross validation scoring for regression

Tags:

python

scikit-learn

regression

How can one use cross_val_score for regression? The default scoring seems to be accuracy, which is not very meaningful for regression. Supposedly I would like to use mean squared error, is it possible to specify that in cross_val_score?

Tried the following two but doesn't work:

scores = cross_validation.cross_val_score(svr, diabetes.data, diabetes.target, cv=5, scoring='mean_squared_error')

and

scores = cross_validation.cross_val_score(svr, diabetes.data, diabetes.target, cv=5, scoring=metrics.mean_squared_error)

The first one generates a list of negative numbers while mean squared error should always be non-negative. The second one complains that:

mean_squared_error() takes exactly 2 arguments (3 given)

479

asked Jun 10 '14 03:06

clwen

2 Answers

I dont have the reputation to comment but I want to provide this link for you and/or a passersby where the negative output of the MSE in scikit learn is discussed - https://github.com/scikit-learn/scikit-learn/issues/2439

In addition (to make this a real answer) your first option is correct in that not only is MSE the metric you want to use to compare models but R^2 cannot be calculated depending (I think) on the type of cross-val you are using.

If you choose MSE as a scorer, it outputs a list of errors which you can then take the mean of, like so:

# Doing linear regression with leave one out cross val  from sklearn import cross_validation, linear_model import numpy as np  # Including this to remind you that it is necessary to use numpy arrays rather  # than lists otherwise you will get an error X_digits = np.array(x) Y_digits = np.array(y)  loo = cross_validation.LeaveOneOut(len(Y_digits))  regr = linear_model.LinearRegression()  scores = cross_validation.cross_val_score(regr, X_digits, Y_digits, scoring='mean_squared_error', cv=loo,)  # This will print the mean of the list of errors that were output and  # provide your metric for evaluation print scores.mean()

108

answered Oct 12 '22 07:10

Sirrah

The first one is correct. It outputs the negative of the MSE, as it always tries to maximize the score. Please help us by suggesting an improvement to the documentation.

answered Oct 12 '22 07:10

Andreas Mueller

Related questions
                            
                                when to use if vs elif in python
                            
                                Hide some maybe-no-member Pylint errors
                            
                                sklearn doesn't have attribute 'datasets'
                            
                                Pandas monthly rolling operation
                            
                                Standalone Python applications in Linux
                            
                                Determining the byte size of a scipy.sparse matrix?
                            
                                Order columns of a pandas dataframe according to the values in a row
                            
                                How do I run multiple Python test cases in a loop?
                            
                                How do you shift Pandas DataFrame with a multiindex?
                            
                                Flask Blueprint AttributeError: 'module' object has no attribute 'name' error
                            
                                What does the slice() function do in Python?
                            
                                In TensorFlow, how can I get nonzero values and their indices from a tensor with python?
                            
                                What type-hint contains both list and tuple?
                            
                                How to convert keras(h5) file to a tflite file?
                            
                                Python super class reflection
                            
                                Convert single-quoted string to double-quoted string
                            
                                Searching/reading binary data in Python
                            
                                Using list as a data type in a column (SQLAlchemy)
                            
                                How to use Python Pip install software, to pull packages from Github?
                            
                                How can I list all available windows locales in python console?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With