I am trying to run the following code for a brief machine learning algorithm:
import re
import argparse
import csv
from collections import Counter
from sklearn import datasets
import sklearn
from sklearn.datasets import fetch_mldata
dataDict = datasets.fetch_mldata('MNIST Original')
In this piece of code, I am trying to read the dataset 'MNIST Original' present at mldata.org via sklearn. This results in the following error(there are more lines of code but I am getting error at this particular line):
Traceback (most recent call last):
File "C:\Program Files (x86)\JetBrains\PyCharm 2.7.3\helpers\pydev\pydevd.py", line 1481, in <module>
debugger.run(setup['file'], None, None)
File "C:\Program Files (x86)\JetBrains\PyCharm 2.7.3\helpers\pydev\pydevd.py", line 1124, in run
pydev_imports.execfile(file, globals, locals) #execute the script
File "C:/Users/sony/PycharmProjects/Machine_Learning_Homework1/zeroR.py", line 131, in <module>
dataDict = datasets.fetch_mldata('MNIST Original')
File "C:\Anaconda\lib\site-packages\sklearn\datasets\mldata.py", line 157, in fetch_mldata
matlab_dict = io.loadmat(matlab_file, struct_as_record=True)
File "C:\Anaconda\lib\site-packages\scipy\io\matlab\mio.py", line 176, in loadmat
matfile_dict = MR.get_variables(variable_names)
File "C:\Anaconda\lib\site-packages\scipy\io\matlab\mio5.py", line 294, in get_variables
res = self.read_var_array(hdr, process)
File "C:\Anaconda\lib\site-packages\scipy\io\matlab\mio5.py", line 257, in read_var_array
return self._matrix_reader.array_from_header(header, process)
File "mio5_utils.pyx", line 624, in scipy.io.matlab.mio5_utils.VarReader5.array_from_header (scipy\io\matlab\mio5_utils.c:5717)
File "mio5_utils.pyx", line 653, in scipy.io.matlab.mio5_utils.VarReader5.array_from_header (scipy\io\matlab\mio5_utils.c:5147)
File "mio5_utils.pyx", line 721, in scipy.io.matlab.mio5_utils.VarReader5.read_real_complex (scipy\io\matlab\mio5_utils.c:6134)
File "mio5_utils.pyx", line 424, in scipy.io.matlab.mio5_utils.VarReader5.read_numeric (scipy\io\matlab\mio5_utils.c:3704)
File "mio5_utils.pyx", line 360, in scipy.io.matlab.mio5_utils.VarReader5.read_element (scipy\io\matlab\mio5_utils.c:3429)
File "streams.pyx", line 181, in scipy.io.matlab.streams.FileStream.read_string (scipy\io\matlab\streams.c:2711)
IOError: could not read bytes
I have tried researching on internet but there is hardly any help available. Any expert help related to solving this error will be much appreciated.
TIA.
In NLTK there is a nltk. download() function to download the datasets that are comes with the NLP suite. In sklearn, it talks about loading data sets (http://scikit-learn.org/stable/datasets/) and fetching datas from http://mldata.org/ but for the rest of the datasets, the instructions were to download from the source.
sklearn. datasets. fetch_covtype will load the covertype dataset; it returns a dictionary-like object with the feature matrix in the data member and the target values in target. The dataset will be downloaded from the web if necessary.
As of version 0.20, sklearn deprecates fetch_mldata
function and adds fetch_openml
instead.
Download MNIST dataset with the following code:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784')
There are some changes to the format though. For instance, mnist['target']
is an array of string category labels (not floats as before).
Looks like the cached data are corrupted. Try removing them and download again (it takes a moment). If not specified differently the data for 'MINST original' should be in
~/scikit_learn_data/mldata/mnist-original.mat
I downloaded the dataset from this link
https://github.com/amplab/datascience-sp14/blob/master/lab7/mldata/mnist-original.mat
then I typed these lines
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original', transpose_data=True, data_home='files')
*** the path is (your working directory)/files/mldata/mnist-original.mat
I hope you get it , it worked well for me
Here is some sample code how to get MNIST data ready to use for sklearn:
def get_data():
"""
Get MNIST data ready to learn with.
Returns
-------
dict
With keys 'train' and 'test'. Both do have the keys 'X' (features)
and'y' (labels)
"""
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')
x = mnist.data
y = mnist.target
# Scale data to [-1, 1] - This is of mayor importance!!!
x = x/255.0*2 - 1
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y,
test_size=0.33,
random_state=42)
data = {'train': {'X': x_train,
'y': y_train},
'test': {'X': x_test,
'y': y_test}}
return data
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With