I am learning how to develop a Backpropagation Neural Network using scikit-learn. I still confuse with how to implement k-fold cross validation in my neural network. I wish you guys can help me out. My code is as follow:
import numpy as np
from sklearn.model_selection import KFold
from sklearn.neural_network import MLPClassifier
f = open("seeds_dataset.txt")
data = np.loadtxt(f)
X=data[:,0:]
y=data[:,-1]
kf = KFold(n_splits=10)
X_train, X_test, y_train, y_test = X[train], X[test], y[train], y[test]
clf = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)
clf.fit(X, y)
MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto',
beta_1=0.9, beta_2=0.999, early_stopping=False,
epsilon=1e-08, hidden_layer_sizes=(5, 2), learning_rate='constant',
learning_rate_init=0.001, max_iter=200, momentum=0.9,
nesterovs_momentum=True, power_t=0.5, random_state=1, shuffle=True,
solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False,
warm_start=False)
K-Folds cross-validator. Provides train/test indices to split data in train/test sets. Split dataset into k consecutive folds (without shuffling by default). Each fold is then used once as a validation while the k - 1 remaining folds form the training set. Read more in the User Guide.
Do not split your data into train and test. This is automatically handled by the KFold cross-validation.
from sklearn.model_selection import KFold
kf = KFold(n_splits=10)
clf = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)
for train_indices, test_indices in kf.split(X):
clf.fit(X[train_indices], y[train_indices])
print(clf.score(X[test_indices], y[test_indices]))
KFold validation partitions your dataset into n equal, fair portions. Each portion is then split into test and train. With this, you get a fairly accurate measure of the accuracy of your model since it is tested on small portions of fairly distributed data.
Kudos to @COLDSPEED's answer.
If you'd like to have the prediction of n fold cross-validation, cross_val_predict() is the way to go.
# Scamble and subset data frame into train + validation(80%) and test(10%)
df = df.sample(frac=1).reset_index(drop=True)
train_index = 0.8
df_train = df[ : len(df) * train_index]
# convert dataframe to ndarray, since kf.split returns nparray as index
feature = df_train.iloc[:, 0: -1].values
target = df_train.iloc[:, -1].values
solver = MLPClassifier(activation='relu', solver='adam', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1, verbose=True)
y_pred = cross_val_predict(solver, feature, target, cv = 10)
Basically, the option cv indicates how many cross-validation you'd like to do in the training. y_pred is the same size as target.
In case you are looking for already built in method to do this, you can take a look at cross_validate.
from sklearn.model_selection import cross_validate
model = MLPClassifier()
cv_results = cross_validate(model, X, Y, cv=10,
return_train_score=False,
scoring=model.score)
print("Fit scores: {}".format(cv_results['test_score']))
The thing I like about this approach is it gives you access to the fit_time, score_time, and test_score. It also allows you to supply your choice of scoring metrics and cross-validation generator/iterable (i.e. Kfold). Another good resource is Cross Validation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With