Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading a Matlab's cell array saved as a v7.3 .mat file with H5py

I saved a cell array as a .mat file in Matlab as follows:

test = {'hello'; 'world!'};
save('data.mat', 'test', '-v7.3')

How can I import it as the list of strings in Python with H5py?

I tried

f = h5py.File('data.mat', 'r')
print f.get('test')
print f.get('test')[0]

This prints out:

<HDF5 dataset "test": shape (1, 2), type "|O8">
[<HDF5 object reference> <HDF5 object reference>]

How can I dereference it to get the list of strings ['hello', 'world!'] in Python?

like image 844
Franck Dernoncourt Avatar asked Feb 01 '15 02:02

Franck Dernoncourt


People also ask

What is MAT file Version 7. 3?

Version 7.3 MAT-files use an HDF5 based format that requires some overhead storage to describe the contents of the file. For cell arrays, structure arrays, or other containers that can store heterogeneous data types, Version 7.3 MAT-files are sometimes larger than Version 7 MAT-files.

How do I view a .MAT file?

How to Open an MAT File. MAT files that are Microsoft Access Shortcut files can be created by dragging a table out of Access and to the desktop or into another folder. Microsoft Access needs to be installed in order to use them. MATLAB from MathWorks can open MAT files that are used by that program.

What kind of files are stored with. MAT extension?

Files with a . mat extension contain MATLAB formatted data, and data can be loaded from or written to these files using the functions load and save , respectively.


3 Answers

Writing in Matlab:

test = {'Hello', 'world!'; 'Good', 'morning'; 'See', 'you!'};
save('data.mat', 'test', '-v7.3') % v7.3 so that it is readable by h5py

enter image description here

Reading in Python (works for any number or rows or columns, but assumes that each cell is a string):

import h5py
import numpy as np

data = []
with h5py.File("data.mat") as f:
    for column in f['test']:
        row_data = []
        for row_number in range(len(column)):            
            row_data.append(''.join(map(unichr, f[column[row_number]][:])))   
        data.append(row_data)

print data
print np.transpose(data)

Output:

[[u'Hello', u'Good', u'See'], [u'world!', u'morning', u'you!']]

[[u'Hello' u'world!']
 [u'Good' u'morning']
 [u'See' u'you!']]
like image 64
Franck Dernoncourt Avatar answered Oct 14 '22 19:10

Franck Dernoncourt


This answer should be seen as an addition to Franck Dernoncourt's answer, which totally suffices for all cell arrays that contain 'flat' data (for mat files of version 7.3 and probably above).

I encountered a case where I had nested data (e.g. 1 row of cell arrays inside a named cell array). I managed to get my hands on the data by doing the following:

# assumption:
# idx_of_interest specifies the index of the cell array we are interested in
# (at the second level)

with h5py.File(file_name) as f:
    data_of_interest_reference = f['cell_array_name'][idx_of_interest, 0]
    data_of_interest = f[data_of_interest_reference]

Reason this works for nested data: If you look at the type of the dataset you want to retrieve at a deeper level, it says 'h5py.h5r.Reference'. In order to actually retrieve the data the reference points to, you need to provide that reference to the file object.

like image 21
Benjamin Spiegl Avatar answered Oct 14 '22 20:10

Benjamin Spiegl


I know this is an old question. But I found a package to scratch that itch:

hdf5storage

It can be installed by pip and works nicely on python 3.6 for both pre and post 7.3 matlab files. For older files it calls scipy.io.loadmat according to the docs.

like image 4
magu_ Avatar answered Oct 14 '22 20:10

magu_