How does Keras load_data() know what part of the data is the train and test set?

Question

I'm quite new to Keras and I wanted to begin with a tutorial. There, let's say almost at the beginning, the code lines

Load pre-shuffled MNIST data into train and test sets

(X_train, y_train), (X_test, y_test) = mnist.load_data()

emerge. I wonder how Keras knows what of the data is part of the training and what is part of the testing? Though it's quite a basic question I'm not able to see the certain definition in the Keras documentation (the searching even does not provide any result there). Therefore, I appreciate any help as I often cannot find any command definitions in Keras. For other languages, like C++, R, Python and so on it is quite easy to find some definitions. But for Keras, even google doesn't provide me useful searching results (at least not in the first 2 pages).

TL;DR: How does load_data() know what is train and test of the data set?

Julio Daniel Reyes · Accepted Answer

The best way to find out is looking at Kera's code:

def load_data(path='mnist.npz'):
    path = get_file(path, origin='https://s3.amazonaws.com/img-datasets/mnist.npz', file_hash='8a61469f7ea1b51cbae51d4f78837e45')
    with np.load(path, allow_pickle=True) as f:
        x_train, y_train = f['x_train'], f['y_train']
        x_test, y_test = f['x_test'], f['y_test']
    return (x_train, y_train), (x_test, y_test)

You can see basically is downloading a file which contains the dataset, which is already separated in train and test data. The only parameter (path) is basically where to store the downloaded dataset.

OverLordGoldDragon · Answer

For Keras source-stuff, I recommend searching the Github repository - e.g., Google "keras mnist github". From the source code, mnist.load_data() unpacks a dataset that was specifically pickled into a format that allows extracting the data as shown in the source code (also pre-sorted into train vs test, pre-shuffled, etc).

Keras then returns the unpacked data in the form you used above.

How does Keras load_data() know what part of the data is the train and test set?

Tags:

keras

Ben

2 Answers

Julio Daniel Reyes

OverLordGoldDragon

Recent Activity

Donate For Us

How does Keras load_data() know what part of the data is the train and test set?

Tags:

keras

Ben

2 Answers

Julio Daniel Reyes

OverLordGoldDragon

Related questions

Recent Activity

Donate For Us