Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fetch_mldata: how to manually set up MNIST dataset when source server is down?

I need to run a code that contains these lines:

from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')

There seems to be a problem with executing it.

TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

As the code tries to download something from somewhere and my internet connecton works well, I assume that server that it wants to access is down.

How can I set it up manually?

like image 700
Piotrek Avatar asked Dec 08 '22 14:12

Piotrek


1 Answers

fetch_mldata will by default check the data in `'~/scikit_learn_data/mldata' to see if the dataset is already downloaded or not.

According to source code

    # if the file does not exist, download it
    if not exists(filename):
        urlname = MLDATA_BASE_URL % quote(dataname)

So in your case, it will check the location

~/scikit_learn_data/mldata/mnist-original.mat

and if not found, it will download from

http://mldata.org/repository/data/download/matlab/mnist-original.mat

which currently is down as you suspected.

So what you can do is download the dataset from any other location like this:

https://github.com/amplab/datascience-sp14/blob/master/lab7/mldata/mnist-original.mat

and keep that in the above folder.

After that when you run fetch_mldata() it should pick the downloaded dataset without connecting mldata.org.

Update:

Here ~ refers to the user home folder. You can use the following code to know the default location of that folder according to your system.

from sklearn.datasets import get_data_home
print(get_data_home())
like image 138
Vivek Kumar Avatar answered Dec 10 '22 04:12

Vivek Kumar