how to read Mat v7.3 files in python ？

Tags:

I am trying to read the mat file given in the following website, ufldl.stanford.edu/housenumbers, in the file train.tar.gz, there is a mat file named digitStruct.mat.

when i used scipy.io to read the mat file, it alerts me with the message ' please use hdf reader for matlab v7.3 files '.

the original matlab file is provided as below

Click to copy

load digitStruct.mat
for i = 1:length(digitStruct)
    im = imread([digitStruct(i).name]);
    for j = 1:length(digitStruct(i).bbox)
        [height, width] = size(im);
        aa = max(digitStruct(i).bbox(j).top+1,1);
        bb = min(digitStruct(i).bbox(j).top+digitStruct(i).bbox(j).height, height);
        cc = max(digitStruct(i).bbox(j).left+1,1);
        dd = min(digitStruct(i).bbox(j).left+digitStruct(i).bbox(j).width, width);

        imshow(im(aa:bb, cc:dd, :));
        fprintf('%d\n',digitStruct(i).bbox(j).label );
        pause;
    end
end

as shown above, the mat file has the key 'digitStruct', and within 'digitStruct', key 'name' and 'bbox' can be found, I used h5py API to read the file.

Click to copy

import h5py
f = h5py.File('train.mat')
print len( f['digitStruct']['name'] ), len(f['digitStruct']['bbox']   )

I can read the array, however when I loop though the array, how can I read each item?

Click to copy

for i in f['digitStruct']['name']:
    print i # only print out the HDF5 ref

768

asked Nov 03 '15 21:11

2 Answers

Writing in Matlab:

Click to copy

test = {'Hello', 'world!'; 'Good', 'morning'; 'See', 'you!'};
save('data.mat', 'test', '-v7.3') % v7.3 so that it is readable by h5py

enter image description here

Reading in Python (works for any number or rows or columns, but assumes that each cell is a string):

Click to copy

import h5py
import numpy as np

data = []
with h5py.File("data.mat") as f:
    for column in f['test']:
        row_data = []
        for row_number in range(len(column)):            
            row_data.append(''.join(map(unichr, f[column[row_number]][:])))   
        data.append(row_data)

print data
print np.transpose(data)

Output:

Click to copy

[[u'Hello', u'Good', u'See'], [u'world!', u'morning', u'you!']]

[[u'Hello' u'world!']
 [u'Good' u'morning']
 [u'See' u'you!']]

123

answered Sep 18 '22 15:09

Franck Dernoncourt

Click to copy

import numpy as np
import cPickle as pickle
import h5py

f = h5py.File('train/digitStruct.mat')

metadata= {}
metadata['height'] = []
metadata['label'] = []
metadata['left'] = []
metadata['top'] = []
metadata['width'] = []

def print_attrs(name, obj):
    vals = []
        if obj.shape[0] == 1:
            vals.append(int(obj[0][0]))
        else:
            for k in range(obj.shape[0]):
                vals.append(int(f[obj[k][0]][0][0]))
        metadata[name].append(vals)

for item in f['/digitStruct/bbox']:
    f[item[0]].visititems(print_attrs)

with open('train_metadata.pickle','wb') as pf:
  pickle.dump(metadata, pf, pickle.HIGHEST_PROTOCOL)

I modified it from https://discussions.udacity.com/t/how-to-deal-with-mat-files/160657/3. Honestly, I can't figure out exactly what's going on with visititmes(). The HDF5 file is just too hierarchy and too abstract.

This metadata is a dictionary. The content of each key is an embedded array. The array has 33402 items, which correspond to the png file with the name in order. Each item is an array with length from 1~6. I count the numbers of different digits, which is 5137, 18130, 8691, 1434, 9,1.

What surprises me is that the pickle file is only 9 MB, more than 20 times smaller than the mat file. I guess the HDS file sacrifice storing space for hierarchy structure.

Note: I have converted the values to integers, for the sake of slicing images. Now the train_metadata.pickle file has a size of only 2 MB, 100 times than the mat file.

answered Sep 20 '22 15:09

Yuchao Jiang

Related questions
                            
                                Merge columns and create new column with pandas
                            
                                Python SpeechRecognition ignores timeout when listening, and hangs
                            
                                How to save in openpyxl without losing formulae?
                            
                                How do I check for open transactions on a psycopg2 connection?
                            
                                Bounded root finding in scipy
                            
                                Python: how to monkey patch (swap) classes
                            
                                How to load balance celery tasks across several servers?
                            
                                Efficiently find indices of nearest points on non-rectangular 2D grid
                            
                                Django Rest Framework, passing attributes in the url
                            
                                Returning ranked search results using gin index with sqlalchemy
                            
                                Importing a Python module works from command line, but not from PyCharm
                            
                                Using same library with different version across virtual environment
                            
                                In Python, how do I refer to an XML tag that contains a hyphen
                            
                                Pandas: Exponentially decaying sum with variable weights
                            
                                Cython ndarray with arbitrary dimensions
                            
                                Vaultier is unusable for docker/ubuntu/debian (Python)
                            
                                How to detect black and gray from an image
                            
                                Open a file, read content, make content into a list using regex, then print list in python
                            
                                Use Python alongside C# in Windows UWP app
                            
                                Use Regex to extract file path and save it in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to read Mat v7.3 files in python ？

Tags:

python

hdf5

matlab

h5py

mat

user824624

People also ask

2 Answers

Franck Dernoncourt

Yuchao Jiang

Recent Activity

Donate For Us