I get different values for different runs. What am I doing wrong here?
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, cross_val_score
X = np.random.random((100,5))
y = np.random.randint(0,2,(100,))
cross_val_score = RandomForestClassifier()
cv = StratifiedKFold(y, random_state=1)
s = cross_val_score(cross_val_score, X, y,scoring='roc_auc', cv=cv)
print(s)
# [ 0.42321429 0.44360902 0.34398496]
s = cross_val_score(cross_val_score, X, y, scoring='roc_auc', cv=cv)
print(s)
# [ 0.42678571 0.46804511 0.36090226]
The cross_val_score() function will be used to perform the evaluation, taking the dataset and cross-validation configuration and returning a list of scores calculated for each fold.
model_selection . cross_val_score. Evaluate a score by cross-validation.
random_state : int Random number seed. ( replaces seed) random_state is the one to be used, however, no matter what random_state or seed I use, the model produce the same results.
Cross_val_score runs single metric cross validation whilst cross_validate runs multi metric. This means that cross_val_score will only accept a single metric and return this for each fold, whilst cross_validate accepts a list of multiple metrics and will return all these for each fold.
The mistake you are making is calling the RandomForestClassifier
whose default arg, random_state
is None. So, it picks up the seed generated by np.random
to produce the random output.
The random_state
in both StratifiedKFold
and RandomForestClassifier
need to be the same inorder to produce equal arrays of scores of cross validation.
Illustration:
X=np.random.random((100,5))
y=np.random.randint(0,2,(100,))
clf = RandomForestClassifier(random_state=1)
cv = StratifiedKFold(y, random_state=1) # Setting random_state is not necessary here
s = cross_val_score(clf, X,y,scoring='roc_auc', cv=cv)
print(s)
##[ 0.57612457 0.29044118 0.30514706]
print(s)
##[ 0.57612457 0.29044118 0.30514706]
Another way of countering it would be to not provide random_state
args for both RFC and SKF. But, simply providing the np.random.seed(value)
to create the random integers at the beginning. These would also create equal arrays at the output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With