Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Scikit-learn cross val score: too many indices for array


I have the following code

 from sklearn.ensemble import ExtraTreesClassifier  from sklearn.cross_validation import cross_val_score  #split the dataset for train and test  combnum['is_train'] = np.random.uniform(0, 1, len(combnum)) <= .75  train, test = combnum[combnum['is_train']==True], combnum[combnum['is_train']==False]   et = ExtraTreesClassifier(n_estimators=200, max_depth=None, min_samples_split=10, random_state=0)  min_samples_split=10, random_state=0  )   labels = train[list(label_columns)].values  tlabels = test[list(label_columns)].values   features = train[list(columns)].values  tfeatures = test[list(columns)].values   et_score = cross_val_score(et, features, labels, n_jobs=-1)  print("{0} -> ET: {1})".format(label_columns, et_score)) 

Checking the shape of the arrays:

 features.shape  Out[19]:(43069, 34) 


labels.shape Out[20]:(43069, 1) 

and I'm getting:

IndexError: too many indices for array 

and this relevant part of the traceback:

---> 22 et_score = cross_val_score(et, features, labels, n_jobs=-1) 

I'm creating the data from Pandas dataframes and I searched here and saw some reference to possible errors via this method but can't figure out how to correct? What the data arrays look like: features

Out[21]: array([[ 0.,  1.,  1., ...,  0.,  0.,  1.],    [ 0.,  1.,  1., ...,  0.,  0.,  1.],    [ 1.,  1.,  1., ...,  0.,  0.,  1.],    ...,     [ 0.,  0.,  1., ...,  0.,  0.,  1.],    [ 0.,  0.,  1., ...,  0.,  0.,  1.],    [ 0.,  0.,  1., ...,  0.,  0.,  1.]]) 


Out[22]: array([[1],    [1],    [1],    ...,     [1],    [1],    [1]]) 
like image 416
dartdog Avatar asked Aug 13 '15 17:08


People also ask

What is the difference between Cross_val_score and Cross_validate?

The cross_validate function differs from cross_val_score in two ways: It allows specifying multiple metrics for evaluation. It returns a dict containing fit-times, score-times (and optionally training scores as well as fitted estimators) in addition to the test score.

What is Cross_val_score in Sklearn?

Cross_val_score in sklearn, what is it? Cross_val_score is a function in the scikit-learn package which trains and tests a model over multiple folds of your dataset. This cross validation method gives you a better understanding of model performance over the whole dataset instead of just a single train/test split.

How do you use cross Val score?

"cross_val_score" splits the data into say 5 folds. Then for each fold it fits the data on 4 folds and scores the 5th fold. Then it gives you the 5 scores from which you can calculate a mean and variance for the score. You crossval to tune parameters and get an estimate of the score.

What does Cross_val_score return?

score() method will return the mean accuracy. With cross_val_score you are comparing one RandomForestClassifier model with some hyperparameters to another with different hyperparameters and selecting the best.

1 Answers

When we do cross validation in scikit-learn, the process requires an (R,) shape label instead of (R,1). Although they are the same thing to some extend, their indexing mechanisms are different. So in your case, just add:

c, r = labels.shape labels = labels.reshape(c,) 

before passing it to the cross-validation function.

like image 146
YE LIANG HARRY Avatar answered Oct 23 '22 06:10