Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use cross_val_score with random_state

I get different values for different runs. What am I doing wrong here?

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, cross_val_score

X = np.random.random((100,5))
y = np.random.randint(0,2,(100,))
cross_val_score = RandomForestClassifier()
cv = StratifiedKFold(y, random_state=1)
s = cross_val_score(cross_val_score, X, y,scoring='roc_auc', cv=cv)
print(s)
# [ 0.42321429  0.44360902  0.34398496]

s = cross_val_score(cross_val_score, X, y, scoring='roc_auc', cv=cv)
print(s)
# [ 0.42678571  0.46804511  0.36090226]
like image 859
maxymoo Avatar asked Sep 30 '16 01:09

maxymoo


People also ask

What is the use of Cross_val_score?

The cross_val_score() function will be used to perform the evaluation, taking the dataset and cross-validation configuration and returning a list of scores calculated for each fold.

What does sklearn Model_selection Cross_val_score do?

model_selection . cross_val_score. Evaluate a score by cross-validation.

What is random state in Xgboost?

random_state : int Random number seed. ( replaces seed) random_state is the one to be used, however, no matter what random_state or seed I use, the model produce the same results.

What is the difference between Cross_validate and Cross_val_score?

Cross_val_score runs single metric cross validation whilst cross_validate runs multi metric. This means that cross_val_score will only accept a single metric and return this for each fold, whilst cross_validate accepts a list of multiple metrics and will return all these for each fold.


1 Answers

The mistake you are making is calling the RandomForestClassifier whose default arg, random_state is None. So, it picks up the seed generated by np.random to produce the random output.

The random_state in both StratifiedKFold and RandomForestClassifier need to be the same inorder to produce equal arrays of scores of cross validation.

Illustration:

X=np.random.random((100,5))
y=np.random.randint(0,2,(100,))

clf = RandomForestClassifier(random_state=1)
cv = StratifiedKFold(y, random_state=1)        # Setting random_state is not necessary here
s = cross_val_score(clf, X,y,scoring='roc_auc', cv=cv)
print(s)
##[ 0.57612457  0.29044118  0.30514706]
print(s)
##[ 0.57612457  0.29044118  0.30514706]

Another way of countering it would be to not provide random_state args for both RFC and SKF. But, simply providing the np.random.seed(value) to create the random integers at the beginning. These would also create equal arrays at the output.

like image 174
Nickil Maveli Avatar answered Oct 14 '22 05:10

Nickil Maveli