I'm quite new to Keras and I wanted to begin with a tutorial. There, let's say almost at the beginning, the code lines
Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
emerge. I wonder how Keras knows what of the data is part of the training and what is part of the testing? Though it's quite a basic question I'm not able to see the certain definition in the Keras documentation (the searching even does not provide any result there). Therefore, I appreciate any help as I often cannot find any command definitions in Keras. For other languages, like C++, R, Python and so on it is quite easy to find some definitions. But for Keras, even google doesn't provide me useful searching results (at least not in the first 2 pages).
TL;DR: How does load_data() know what is train and test of the data set?
The best way to find out is looking at Kera's code:
def load_data(path='mnist.npz'):
path = get_file(path, origin='https://s3.amazonaws.com/img-datasets/mnist.npz', file_hash='8a61469f7ea1b51cbae51d4f78837e45')
with np.load(path, allow_pickle=True) as f:
x_train, y_train = f['x_train'], f['y_train']
x_test, y_test = f['x_test'], f['y_test']
return (x_train, y_train), (x_test, y_test)
You can see basically is downloading a file which contains the dataset, which is already separated in train and test data.
The only parameter (path
) is basically where to store the downloaded dataset.
For Keras source-stuff, I recommend searching the Github repository - e.g., Google "keras mnist github". From the source code, mnist.load_data()
unpacks a dataset that was specifically pickled into a format that allows extracting the data as shown in the source code (also pre-sorted into train vs test, pre-shuffled, etc).
Keras then returns the unpacked data in the form you used above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With