Need help understanding cross_val_score in sklearn python

Tags:

I am currently trying to implement K-FOLD cross validation in classification using sklearn in python. I understand the basic concept behind K-FOLD and cross validation. However, I dont understand what is the cross_val_score and what does it do and what role does the CV iteration have in getting the array of scores we get. Below are the examples from the official documentation page of sklearn.

**Example 1**
from sklearn import datasets, linear_model
from sklearn.model_selection import cross_val_score
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()
print(cross_val_score(lasso, X, y, cv=3))  
***OUPUT***
[0.33150734 0.08022311 0.03531764]

Taking a look at Example 1, the output generates 3 values in an array. I know that when we use kfold, n_split is the command that generates number of folds. So what does cv do in this example?

**My Code**
kf = KFold(n_splits=4,random_state=seed,shuffle=False)
print('Get_n_splits',kf.get_n_splits(X),'\n\n')
for train_index, test_index in kf.split(X):
print('TRAIN:', train_index, 'TEST:', test_index)
x_train, x_test = df.iloc[train_index], df.iloc[test_index]
y_train, y_test = df.iloc[train_index], df.iloc[test_index]

print('\n\n')

# use train_test_split to split into training and testing data
x_train, x_test, y_train, y_test = cross_validation.train_test_split(X, y,test_size=0.25,random_state=0)

# fit / train the model using the training data
clf = BernoulliNB()
model = clf.fit(x_train, y_train)
y_predicted = clf.predict(x_test)

scores = cross_val_score(model, df, y, cv=4)
print('\n\n')
print('Bernoulli Naive Bayes Classification Cross-validated Scores:', scores)
print('\n\n')

Looking at My Code, I am using 4 Fold cross validation for Bernoulli Naive Bayes Classifier and am using cv=4 in score as below : scores = cross_val_score(model, df, y, cv=4) The above line gives me an array of 4 values. However, if I change it to cv= 8 as below : scores = cross_val_score(model, df, y, cv=8) then an array of 8 values is generated as output. So again, what does cv do here.

I did read the documentation over and over again and searched numerous websites but since I am a newbie, I really don't understand what cv does and how the scores are generated.

Any and all help would be really appreciated.

Thanks in advance

375

asked Oct 02 '18 15:10

Stevi G

1 Answers

In a K-FOLD Cross Validation, the following procedure is followed as follows:

Model is trained using K-1 of the folds as training data
Resulting Model is validated on the remaining data

This process is repeated K times and performance measure such as "ACCURACY" is computed at each step.

Please look at the image below to get a clear picture. It is taken from Cross Validation module of Scikit-Learn.

Cross Validation

>>> from sklearn.model_selection import cross_val_score
>>> clf = svm.SVC(kernel='linear', C=1)
>>> scores = cross_val_score(clf, iris.data, iris.target, cv=5)
>>> scores                                              
array([0.96..., 1.  ..., 0.96..., 0.96..., 1.        ])
>>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
Accuracy: 0.98 (+/- 0.03)

Here the single mean Score is calculated. By default, the score computed at each CV iteration is the score method of the estimator.

I have taken help from the links mentioned below.

"https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html#sklearn.model_selection.cross_val_score"
'https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation'

113

answered Oct 17 '22 00:10

kamranisg

Related questions
                            
                                Gunicorn worker doesn't deflate memory after request
                            
                                How to change dataframe cells values with "coordinate-like" indexes stored in two lists/vectors/series?
                            
                                How do you know in advance if a method (or function) will alter the variable when called?
                            
                                Cancel a Drag & Drop for some specific items in a Gtk.TreeView
                            
                                How to vectorize this peak finding for loop in Python?
                            
                                How to configure line length for VS Code python Sort Imports in user settings?
                            
                                How to disable multiple plugins in pytest.ini?
                            
                                sklearn module not found in anaconda
                            
                                Is there a Pythonic way to run async task in background similar to using a contextmanager?
                            
                                Integrate Paho MQTT with Django
                            
                                Get a clean string from HTML, CSS and JavaScript
                            
                                when should I use query vs eval on a pandas dataframe?
                            
                                input() call where text is typed at custom position in the string
                            
                                Sort list with alphanumeric items by letter first
                            
                                python f'string not working in pd.Series.map function
                            
                                Initial Guess/Warm start in CVXPY: give a hint of the solution
                            
                                extract signed data from pkcs7 in python
                            
                                How to plot normalized histogram with pdf properly using matplotlib?
                            
                                Can pythons lambda be used to change the inner working of another function?
                            
                                Convert integer to binary and then do a left bit shift in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Need help understanding cross_val_score in sklearn python

Tags:

python

validation

computer-vision

scikit-learn

Stevi G

People also ask

1 Answers

kamranisg

Recent Activity

Donate For Us