Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HTTP Error when trying to download MNIST data

Tags:

python

pytorch

I am using Google Colab for training a LeNet-300-100 fully-connected neural network on MNIST using Python3 and PyTorch 1.8.

To apply the transformations and download the MNIST dataset, the following code is being used:

# MNIST dataset statistics:
# mean = tensor([0.1307]) & std dev = tensor([0.3081])
mean = np.array([0.1307])
std_dev = np.array([0.3081])

transforms_apply = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean = mean, std = std_dev)
    ])

which gives the error:

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz --------------------------------------------------------------------------- HTTPError Traceback (most recent call last) in () 2 train_dataset = torchvision.datasets.MNIST( 3 root = './data', train = True, ----> 4 transform = transforms_apply, download = True 5 ) 6

11 frames /usr/lib/python3.7/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs) 647 class HTTPDefaultErrorHandler(BaseHandler): 648 def http_error_default(self, req, fp, code, msg, hdrs): --> 649 raise HTTPError(req.full_url, code, msg, hdrs, fp) 650 651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 503: Service Unavailable

What's wrong?

like image 526
Arun Avatar asked Mar 11 '21 05:03

Arun


People also ask

How do I download a MNIST dataset?

Use the following command to download the MNIST dataset onto your server: $ python -m digits. download_data mnist ~/mnist Downloading url=http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz ... Downloading url=http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz ...


Video Answer


5 Answers

I was having the same 503 error and this worked for me

!wget www.di.ens.fr/~lelarge/MNIST.tar.gz
!tar -zxvf MNIST.tar.gz

from torchvision.datasets import MNIST
from torchvision import transforms

train_set = MNIST('./', download=True,
transform=transforms.Compose([
transforms.ToTensor(),
]), train=True)


test_set = MNIST('./', download=True,
transform=transforms.Compose([
transforms.ToTensor(),
]), train=False)
like image 75
Saad Hassan Avatar answered Oct 22 '22 05:10

Saad Hassan


There has been a lot of trouble with the MNIST hosted on http://yann.lecun.com/exdb/mnist/ therefore pytorch got permission and hosting it now on amazon aws.

Unfortunately, the fix is only available in the nightly build (Here you can find the fixed code. )

A hot fix I found useful is:

from torchvision import datasets
new_mirror = 'https://ossci-datasets.s3.amazonaws.com/mnist'
datasets.MNIST.resources = [
   ('/'.join([new_mirror, url.split('/')[-1]]), md5)
   for url, md5 in datasets.MNIST.resources
]
train_dataset = datasets.MNIST(
   "../data", train=True, download=True, transform=transform
)

Update: According to torch vision issue 3549 this will be fixed in the next minor release

like image 34
user3411517 Avatar answered Oct 22 '22 05:10

user3411517


This problem has been solved in torchvision==0.9.1 according to this. As a temporary solution, please use the following workaround:

from torchvision import datasets, transforms
datasets.MNIST.resources = [
    ('https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz', 'f68b3c2dcbeaaa9fbdd348bbdeb94873'),
    ('https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz', 'd53e105ee54ea40749a09fcbcd1e9432'),
    ('https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz', '9fb629c4189551a2d022fa330f9573f3'),
    ('https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz', 'ec29112dd5afa0611ce80d1b7f02629c')
]

# AND the rest of your code as usual for train and test (EXAMPLE):
batch_sz = 100
tr_ = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
# MNIST
train_dataset = datasets.MNIST(
    root='./dataset', 
    train=True, 
    transform=tr_,  
    download=True
)

test_dataset = datasets.MNIST(
    root='./dataset', 
    train=False, 
    transform=tr_  
)
# DataLoader
train_loader = torch.utils.data.DataLoader(
    dataset=train_dataset,
    batch_size=batch_sz,
    shuffle=True 
)

test_loader = torch.utils.data.DataLoader(
    dataset=test_dataset,
    batch_size=batch_sz,
    shuffle=False 
)
like image 3
Färid Alijani Avatar answered Oct 22 '22 05:10

Färid Alijani


you can try this:

from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', data_home=".")

x = mnist.data
x = x.reshape((-1, 28, 28))
x = x.astype('float32')

y = mnist.target
y = y.astype('float32')
like image 1
Fei Wu Avatar answered Oct 22 '22 06:10

Fei Wu


for PyTorch 0.4.0 in udacity notebooks.

The solution is inspired by the above solution.

new_mirror = 'https://ossci-datasets.s3.amazonaws.com/mnist'
datasets.MNIST.urls = [
   str('/'.join([new_mirror, url.split('/')[-1]]))
   for url in datasets.MNIST.urls
]
transform = transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5,), (0.5,)),
                              ])
like image 1
Okasha55 Avatar answered Oct 22 '22 05:10

Okasha55