Implement K-fold cross validation in MLPClassification Python

Tags:

I am learning how to develop a Backpropagation Neural Network using scikit-learn. I still confuse with how to implement k-fold cross validation in my neural network. I wish you guys can help me out. My code is as follow:

import numpy as np
from sklearn.model_selection import KFold
from sklearn.neural_network import MLPClassifier

f = open("seeds_dataset.txt")
data = np.loadtxt(f)

X=data[:,0:]
y=data[:,-1]
kf = KFold(n_splits=10)
X_train, X_test, y_train, y_test = X[train], X[test], y[train], y[test]
clf = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)
clf.fit(X, y)
MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto',
       beta_1=0.9, beta_2=0.999, early_stopping=False,
       epsilon=1e-08, hidden_layer_sizes=(5, 2), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=1, shuffle=True,
       solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)

570

asked Jun 21 '17 18:06

Ewok The Sith Lord

3 Answers

Do not split your data into train and test. This is automatically handled by the KFold cross-validation.

from sklearn.model_selection import KFold
kf = KFold(n_splits=10)
clf = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)

for train_indices, test_indices in kf.split(X):
    clf.fit(X[train_indices], y[train_indices])
    print(clf.score(X[test_indices], y[test_indices]))

KFold validation partitions your dataset into n equal, fair portions. Each portion is then split into test and train. With this, you get a fairly accurate measure of the accuracy of your model since it is tested on small portions of fairly distributed data.

answered Oct 22 '22 21:10

cs95

Kudos to @COLDSPEED's answer.

If you'd like to have the prediction of n fold cross-validation, cross_val_predict() is the way to go.

# Scamble and subset data frame into train + validation(80%) and test(10%)
df = df.sample(frac=1).reset_index(drop=True)
train_index = 0.8
df_train = df[ : len(df) * train_index]

# convert dataframe to ndarray, since kf.split returns nparray as index
feature = df_train.iloc[:, 0: -1].values
target = df_train.iloc[:, -1].values

solver = MLPClassifier(activation='relu', solver='adam', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1, verbose=True)
y_pred = cross_val_predict(solver, feature, target, cv = 10)

Basically, the option cv indicates how many cross-validation you'd like to do in the training. y_pred is the same size as target.

answered Oct 22 '22 22:10

Michael_Zhang

In case you are looking for already built in method to do this, you can take a look at cross_validate.

from sklearn.model_selection import cross_validate 

model = MLPClassifier() 
cv_results = cross_validate(model, X, Y, cv=10, 
                            return_train_score=False, 
                            scoring=model.score) 
print("Fit scores: {}".format(cv_results['test_score']))

The thing I like about this approach is it gives you access to the fit_time, score_time, and test_score. It also allows you to supply your choice of scoring metrics and cross-validation generator/iterable (i.e. Kfold). Another good resource is Cross Validation.

answered Oct 22 '22 21:10

Arthur Putnam

Related questions
                            
                                TypeError: list indices must be integers, not str (boolean convertion actually)
                            
                                How to combine n-grams into one vocabulary in Spark?
                            
                                How do I call a database function using SQLAlchemy in Flask?
                            
                                Reorder Python argparse argument groups
                            
                                python: update dataframe to existing excel sheet without overwriting contents on the same sheet and other sheets
                            
                                Flask send stream as response
                            
                                Convert date to ordinal python?
                            
                                NetworkX: how to add weights to an existing G.edges()?
                            
                                How can I sample equally from a dataframe?
                            
                                How to group by one column and sort the values of another column?
                            
                                Trying to understand isolation forest algorithm
                            
                                Django url that captures yyyy-mm-dd date
                            
                                How to remove empty rows from an Pyspark RDD
                            
                                What is a keyword in Robot Framework?
                            
                                Python 3.5 dill pickling/unpickling on different servers: "KeyError: 'ClassType'"
                            
                                How to find Run length encoding in python
                            
                                Two functions in parallel with multiple arguments and return values
                            
                                Is it possible to build reports with Python Pandas?
                            
                                Read from bytes not filename to convert audio
                            
                                Convert string to random but deterministically repeatable uniform probability

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Implement K-fold cross validation in MLPClassification Python

Tags:

python

neural-network

scikit-learn