I'm trying to understand using kfolds cross validation from the sklearn python module. I understand the basic flow: <ul> <li>instantiate a model e.g. <code>model = LogisticRegression()</code> </li> <li>fitting the model e.g. <code>model.fit(xtrain, ytrain)</code> </li> <li>predicting e.g. <code>model.predict(ytest)</code> </li> <li>use e.g. cross val score to test the fitted model accuracy.</li> </ul> Where i'm confused is using sklearn kfolds with cross val score. As I understand it the cross_val_score function will fit the model and predict on the kfolds giving you an accuracy score for each fold. e.g. using code like this: <pre class="prettyprint"><code>kf = KFold(n=data.shape[0], n_folds=5, shuffle=True, random_state=8) lr = linear_model.LogisticRegression() accuracies = cross_val_score(lr, X_train,y_train, scoring='accuracy', cv = kf) </code></pre> So if I have a dataset with training and testing data, and I use the <code>cross_val_score</code> function with kfolds to determine the accuracy of the algorithm on my training data for each fold, is the <code>model</code> now fitted and ready for prediction on the testing data? So in the case above using <code>lr.predict</code>

No the model is not fitted. Looking at the source code for <code>cross_val_score</code>: <blockquote> <pre class="prettyprint"><code>scores=parallel(delayed(_fit_and_score)(clone(estimator),X,y,scorer, train,test,verbose,None,fit_params) </code></pre> </blockquote> As you can see, <code>cross_val_score</code> clones the estimator before fitting the fold training data to it. <code>cross_val_score</code> will give you output an array of scores which you can analyse to know how the estimator performs for different folds of the data to check if it overfits the data or not. You can know more about it here You need to fit the whole training data to the estimator once you are satisfied with the results of <code>cross_val_score</code>, before you can use it to predict on test data.

Using sklearn cross_val_score and kfolds to fit and help predict model

Tags:

python

machine-learning

scikit-learn

cross-validation

I'm trying to understand using kfolds cross validation from the sklearn python module.

I understand the basic flow:

instantiate a model e.g. model = LogisticRegression()
fitting the model e.g. model.fit(xtrain, ytrain)
predicting e.g. model.predict(ytest)
use e.g. cross val score to test the fitted model accuracy.

Where i'm confused is using sklearn kfolds with cross val score. As I understand it the cross_val_score function will fit the model and predict on the kfolds giving you an accuracy score for each fold.

e.g. using code like this:

kf = KFold(n=data.shape[0], n_folds=5, shuffle=True, random_state=8)
lr = linear_model.LogisticRegression()
accuracies = cross_val_score(lr, X_train,y_train, scoring='accuracy', cv = kf)

So if I have a dataset with training and testing data, and I use the cross_val_score function with kfolds to determine the accuracy of the algorithm on my training data for each fold, is the model now fitted and ready for prediction on the testing data? So in the case above using lr.predict

615

asked Feb 16 '17 02:02

hselbie

1 Answers

No the model is not fitted. Looking at the source code for cross_val_score:

scores=parallel(delayed(_fit_and_score)(clone(estimator),X,y,scorer,
                                        train,test,verbose,None,fit_params)

As you can see, cross_val_score clones the estimator before fitting the fold training data to it. cross_val_score will give you output an array of scores which you can analyse to know how the estimator performs for different folds of the data to check if it overfits the data or not. You can know more about it here

You need to fit the whole training data to the estimator once you are satisfied with the results of cross_val_score, before you can use it to predict on test data.

122

answered Nov 08 '22 21:11

Vivek Kumar

Related questions
                            
                                "reduce" function in python not work on "namedtuple"?
                            
                                File is not decoded properly
                            
                                modeless dialog tkinter
                            
                                How to I catch and handle a fatal error when Py_initialize fails?
                            
                                approximate search in a database
                            
                                python : How to detect device name/id on a serial COM
                            
                                Python26, Win32, ZBar - ImportError: DLL load failed
                            
                                Using scipy.stats.stats in django after deployment
                            
                                Python string replace not working new lines
                            
                                How do I extract a column from text using Python?
                            
                                inheriting config file settings in pyramid
                            
                                Switching user in Fabric
                            
                                How to silence statsmodels.fit() in python
                            
                                How to extract bias weights in Keras sequential model? [duplicate]
                            
                                Closing Pygame Window
                            
                                accessing a python int literals methods [duplicate]
                            
                                how to set readable xticks in seaborn's facetgrid?
                            
                                How to import a csv-file into a data array?
                            
                                What could cause a python module to be imported twice?
                            
                                How to find an image within another image using python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With