Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to read Mat v7.3 files in python ?

I am trying to read the mat file given in the following website, ufldl.stanford.edu/housenumbers, in the file train.tar.gz, there is a mat file named digitStruct.mat.

when i used scipy.io to read the mat file, it alerts me with the message ' please use hdf reader for matlab v7.3 files '.

the original matlab file is provided as below

load digitStruct.mat
for i = 1:length(digitStruct)
    im = imread([digitStruct(i).name]);
    for j = 1:length(digitStruct(i).bbox)
        [height, width] = size(im);
        aa = max(digitStruct(i).bbox(j).top+1,1);
        bb = min(digitStruct(i).bbox(j).top+digitStruct(i).bbox(j).height, height);
        cc = max(digitStruct(i).bbox(j).left+1,1);
        dd = min(digitStruct(i).bbox(j).left+digitStruct(i).bbox(j).width, width);

        imshow(im(aa:bb, cc:dd, :));
        fprintf('%d\n',digitStruct(i).bbox(j).label );
        pause;
    end
end

as shown above, the mat file has the key 'digitStruct', and within 'digitStruct', key 'name' and 'bbox' can be found, I used h5py API to read the file.

import h5py
f = h5py.File('train.mat')
print len( f['digitStruct']['name'] ), len(f['digitStruct']['bbox']   )

I can read the array, however when I loop though the array, how can I read each item?

for i in f['digitStruct']['name']:
    print i # only print out the HDF5 ref
like image 768
user824624 Avatar asked Nov 03 '15 21:11

user824624


People also ask

Can we read .MAT file in Python?

By default, Python is not capable of reading . mat files.

How do I read a .MAT file?

How to Open an MAT File. MAT files that are Microsoft Access Shortcut files can be created by dragging a table out of Access and to the desktop or into another folder. Microsoft Access needs to be installed in order to use them. MATLAB from MathWorks can open MAT files that are used by that program.


2 Answers

Writing in Matlab:

test = {'Hello', 'world!'; 'Good', 'morning'; 'See', 'you!'};
save('data.mat', 'test', '-v7.3') % v7.3 so that it is readable by h5py

enter image description here

Reading in Python (works for any number or rows or columns, but assumes that each cell is a string):

import h5py
import numpy as np

data = []
with h5py.File("data.mat") as f:
    for column in f['test']:
        row_data = []
        for row_number in range(len(column)):            
            row_data.append(''.join(map(unichr, f[column[row_number]][:])))   
        data.append(row_data)

print data
print np.transpose(data)

Output:

[[u'Hello', u'Good', u'See'], [u'world!', u'morning', u'you!']]

[[u'Hello' u'world!']
 [u'Good' u'morning']
 [u'See' u'you!']]
like image 123
Franck Dernoncourt Avatar answered Sep 18 '22 15:09

Franck Dernoncourt


import numpy as np
import cPickle as pickle
import h5py

f = h5py.File('train/digitStruct.mat')

metadata= {}
metadata['height'] = []
metadata['label'] = []
metadata['left'] = []
metadata['top'] = []
metadata['width'] = []

def print_attrs(name, obj):
    vals = []
        if obj.shape[0] == 1:
            vals.append(int(obj[0][0]))
        else:
            for k in range(obj.shape[0]):
                vals.append(int(f[obj[k][0]][0][0]))
        metadata[name].append(vals)

for item in f['/digitStruct/bbox']:
    f[item[0]].visititems(print_attrs)

with open('train_metadata.pickle','wb') as pf:
  pickle.dump(metadata, pf, pickle.HIGHEST_PROTOCOL)    

I modified it from https://discussions.udacity.com/t/how-to-deal-with-mat-files/160657/3. Honestly, I can't figure out exactly what's going on with visititmes(). The HDF5 file is just too hierarchy and too abstract.

This metadata is a dictionary. The content of each key is an embedded array. The array has 33402 items, which correspond to the png file with the name in order. Each item is an array with length from 1~6. I count the numbers of different digits, which is 5137, 18130, 8691, 1434, 9,1.

What surprises me is that the pickle file is only 9 MB, more than 20 times smaller than the mat file. I guess the HDS file sacrifice storing space for hierarchy structure.

Note: I have converted the values to integers, for the sake of slicing images. Now the train_metadata.pickle file has a size of only 2 MB, 100 times than the mat file.

like image 37
Yuchao Jiang Avatar answered Sep 20 '22 15:09

Yuchao Jiang