Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy: How can I select specific indexes in an np array for k-fold cross validation?

I have a training data set in matrix form of dimensions 5000 x 3027 (CIFAR-10 data set). Using array_split in numpy, I partitioned it into 5 different parts, and I want to select just one of the parts as the cross validation fold. However my problem comes when I use something like XTrain[[Indexes]] where indexes is an array like [0,1,2,3], because doing this gives me a 3D tensor of dimensions 4 x 1000 x 3027, and not a matrix. How do I collapse the "4 x 1000" into 4000 rows, to get a matrix of 4000 x 3027?

for fold in range(len(X_train_folds)):
    indexes = np.delete(np.arange(len(X_train_folds)), fold) 
    XTrain = X_train_folds[indexes]
    X_cv = X_train_folds[fold]
    yTrain = y_train_folds[indexes]
    y_cv = y_train_folds[fold]

    classifier.train(XTrain, yTrain)
    dists = classifier.compute_distances_no_loops(X_cv)
    y_test_pred = classifier.predict_labels(dists, k)

    num_correct = np.sum(y_test_pred == y_test)
    accuracy = float(num_correct/num_test)
    k_to_accuracy[k] = accuracy
like image 668
kwotsin Avatar asked May 22 '16 03:05

kwotsin


People also ask

How do you select values from an NP array?

To select an element from Numpy Array , we can use [] operator i.e. It will return the element at given index only.


1 Answers

Perhaps you can try this instead (new to numpy so if I am doing something inefficient/wrong, would be happy to be corrected)

X_train_folds = np.array_split(X_train, num_folds)
y_train_folds = np.array_split(y_train, num_folds)
k_to_accuracies = {}

for k in k_choices:
    k_to_accuracies[k] = []
    for i in range(num_folds):
        training_data, test_data = np.concatenate(X_train_folds[:i] + X_train_folds[i+1:]), X_train_folds[i]
        training_labels, test_labels = np.concatenate(y_train_folds[:i] + y_train_folds[i+1:]), y_train_folds[i]
        classifier.train(training_data, training_labels)
        predicted_labels = classifier.predict(test_data, k)
        k_to_accuracies[k].append(np.sum(predicted_labels == test_labels)/len(test_labels))
like image 184
Abhas Sinha Avatar answered Sep 27 '22 16:09

Abhas Sinha