Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Keras load_data() know what part of the data is the train and test set?

Tags:

keras

I'm quite new to Keras and I wanted to begin with a tutorial. There, let's say almost at the beginning, the code lines

Load pre-shuffled MNIST data into train and test sets

(X_train, y_train), (X_test, y_test) = mnist.load_data()

emerge. I wonder how Keras knows what of the data is part of the training and what is part of the testing? Though it's quite a basic question I'm not able to see the certain definition in the Keras documentation (the searching even does not provide any result there). Therefore, I appreciate any help as I often cannot find any command definitions in Keras. For other languages, like C++, R, Python and so on it is quite easy to find some definitions. But for Keras, even google doesn't provide me useful searching results (at least not in the first 2 pages).

TL;DR: How does load_data() know what is train and test of the data set?

like image 432
Ben Avatar asked Sep 23 '19 14:09

Ben


2 Answers

The best way to find out is looking at Kera's code:

def load_data(path='mnist.npz'):
    path = get_file(path, origin='https://s3.amazonaws.com/img-datasets/mnist.npz', file_hash='8a61469f7ea1b51cbae51d4f78837e45')
    with np.load(path, allow_pickle=True) as f:
        x_train, y_train = f['x_train'], f['y_train']
        x_test, y_test = f['x_test'], f['y_test']
    return (x_train, y_train), (x_test, y_test)

You can see basically is downloading a file which contains the dataset, which is already separated in train and test data. The only parameter (path) is basically where to store the downloaded dataset.

like image 80
Julio Daniel Reyes Avatar answered Oct 16 '22 19:10

Julio Daniel Reyes


For Keras source-stuff, I recommend searching the Github repository - e.g., Google "keras mnist github". From the source code, mnist.load_data() unpacks a dataset that was specifically pickled into a format that allows extracting the data as shown in the source code (also pre-sorted into train vs test, pre-shuffled, etc).

Keras then returns the unpacked data in the form you used above.

like image 1
OverLordGoldDragon Avatar answered Oct 16 '22 18:10

OverLordGoldDragon