UPDATE: This is a long question that boils down to, can someone explain the numpy array class to me? I answered my own question below.
I am working on a project to import data from matlab into a mysql database whose contents will be made available through a django website. I want to use Scipy.io.loadmat to get the information from matlab into a form I can use in python so that I can enter the data into the database with the django api.
My problem is that I cannot work with the data imported by scipy.io.loadmat. It is loaded in the form of several nested arrays and some of the variable names seem to be lost.
Here is the matlab code for a test structure that I have created for a trial:
sensors.time = [0:1:10].';
sensors.sensor1 = {};
sensors.sensor1.source_type = 'flight';
sensors.sensor1.source_name = 'flight-2';
sensors.sensor1.channels = {};
sensors.sensor1.channels.channel1.name = '1';
sensors.sensor1.channels.channel1.local_ori = 'lateral';
sensors.sensor1.channels.channel1.vehicle_ori = 'axial';
sensors.sensor1.channels.channel1.signals = {};
sensors.sensor1.channels.channel1.signals.signal1.filtered = 'N';
sensors.sensor1.channels.channel1.signals.signal1.filtered_description = 'none';
sensors.sensor1.channels.channel1.signals.signal1.data = sin(sensors.time)+0.1*rand(11,1);
>> sensors
time: [11x1 double]
sensor1: [1x1 struct]
>> sensors.sensor1
source_type: 'flight'
source_name: 'flight-2'
channels: [1x1 struct]
>> sensors.sensor1.channels
channel1: [1x1 struct]
>> sensors.sensor1.channels.channel1
name: '1'
local_ori: 'lateral'
vehicle_ori: 'axial'
signals: [1x1 struct]
>> sensors.sensor1.channels.channel1.signals
signal1: [1x1 struct]
>> sensors.sensor1.channels.channel1.signals.signal1
filtered: 'N'
filtered_description: 'none'
data: [11x1 double]
I can easily visualize this structure as a python dictionary, so it does not seem like this should be such a complicated exercise.
Here is the python code I used to read the file in (eventually I want to read in multiple files):
from scipy
import os, glob
path = 'C:\Users\c\Desktop\import'
for f in glob.glob( os.path.join(path, '*.mat')):
matfile = scipy.io.loadmat(f, struct_as_record=True)
This is the resulting dictionary from loadmat:
>>> matfile
{'sensors': array([[ ([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]],[[(array([u'flight'],
dtype='<U6'), array([u'flight-2'],
dtype='<U8'), array([[ ([[(array([u'1'],
dtype='<U1'), array([u'lateral'],
dtype='<U7'), array([u'axial'],
dtype='<U5'), array([[ ([[(array([u'N'],
dtype='<U1'), array([u'none'],
dtype='<U4'), array([[ 0.06273465],[ 0.84363597],[ 1.00035443],[ 0.22117587],[-0.68221775],[-0.87761299],[-0.24108487],[ 0.71871452],[ 1.04690773],[ 0.46512366],[-0.51651414]]))]],)]],
dtype=[('signal1', '|O4')]))]],)]],
dtype=[('channel1', '|O4')]))]])]],
dtype=[('time', '|O4'), ('sensor1', '|O4')]), '__version__': '1.0', '__header__': 'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Tue Jun 07 18:38:32 2011', '__globals__': []}
The data is all there, but I don't know how to access these class objects. I would like to be able to loop over contents so that I can process, multiple sensors, then multiple channels for each sensor, etc.
Any explanations to help me simplify this data structure or suggested changes to make this easier would be greatly appreciated.
Update, based on Nick's suggestion here is the repr(matfile) and the dir(matfile)
>>> repr(matfile)
"{'sensors': array([[ ([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]], [[(array([u'flight'], \n dtype='<U6'), array([u'flight-2'], \n dtype='<U8'), array([[ ([[(array([u'1'], \n dtype='<U1'), array([u'lateral'], \n dtype='<U7'), array([u'axial'], \n dtype='<U5'), array([[ ([[(array([u'N'], \n dtype='<U1'), array([u'none'], \n dtype='<U4'), array([[ 0.0248629 ],\n [ 0.88663486],\n [ 0.93206871],\n [ 0.22156497],\n [-0.65819207],\n [-0.95592508],\n [-0.22584908],\n [ 0.66569432],\n [ 1.06956739],\n [ 0.51103298],\n [-0.53732649]]))]], [[(array([u'Y'], \n dtype='<U1'), array([u'1. 5 Hz High Pass, 2. remove offset'], \n dtype='<U35'), array([[ 0. ],\n [ 0.84147098],\n [ 0.90929743],\n [ 0.14112001],\n [-0.7568025 ],\n [-0.95892427],\n [-0.2794155 ],\n [ 0.6569866 ],\n [ 0.98935825],\n [ 0.41211849],\n [-0.54402111]]))]])]], \n dtype=[('signal1', '|O4'), ('signal2', '|O4')]))]],)]], \n dtype=[('channel1', '|O4')]))]])]], \n dtype=[('time', '|O4'), ('sensor1', '|O4')]), '__version__': '1.0', '__header__': 'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Wed Jun 08 10:58:19 2011', '__globals__': []}"
>>> dir(matfile)
['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values', 'viewitems', 'viewkeys', 'viewvalues']
>>> dir(matfile['sensors'])
['T', '__abs__', '__add__', '__and__', '__array__', '__array_finalize__', '__array_interface__', '__array_prepare__', '__array_priority__', '__array_struct__', '__array_wrap__', '__class__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__delslice__', '__div__', '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getslice__', '__gt__', '__hash__', '__hex__', '__iadd__', '__iand__', '__idiv__', '__ifloordiv__', '__ilshift__', '__imod__', '__imul__', '__index__', '__init__', '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__long__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__nonzero__', '__oct__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__setitem__', '__setslice__', '__setstate__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__xor__', 'all', 'any', 'argmax', 'argmin', 'argsort', 'astype', 'base', 'byteswap', 'choose', 'clip', 'compress', 'conj', 'conjugate', 'copy', 'ctypes', 'cumprod', 'cumsum', 'data', 'diagonal', 'dot', 'dtype', 'dump', 'dumps', 'fill', 'flags', 'flat', 'flatten', 'getfield', 'imag', 'item', 'itemset', 'itemsize', 'max', 'mean', 'min', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'prod', 'ptp', 'put', 'ravel', 'real', 'repeat', 'reshape', 'resize', 'round', 'searchsorted', 'setfield', 'setflags', 'shape', 'size', 'sort', 'squeeze', 'std', 'strides', 'sum', 'swapaxes', 'take', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'var', 'view']
Obviously I need to learn a bit about objects and classes. How can I pull bits of the array and put them into variables. For example:
time = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
source_type = 'flight'
etc.
Matlab 7.3 and greater Beginning at release 7.3 of Matlab, mat files are actually saved using the HDF5 format by default (except if you use the -vX flag at save time, see in Matlab). These files can be read in Python using, for instance, the PyTables or h5py package.
To load saved variables from a MAT-file into your workspace, double-click the MAT-file in the Current Folder browser. To load a subset of variables from a MAT-file on the Home tab, in the Variable section, click Import Data. Select the MAT-file you want to load and click Open.
I've run into a similar issue with a fairly complex mat file at our company. I'm still getting my head wrapped around the scipy IO module, but here is what we found.
When you access matfile['sensors'] it returns a scipy.io.matlab.mio5_params.mat_struct object, which we can use to access the contents below. When you print it, it looks like a flat array, but you can still access the dict to get at the individual components. So you could run something like this to start accessing the components:
from scipy.io import loadmat
matfile = loadmat('myfile.mat', squeeze_me=True, struct_as_record=False)
matfile['sensors'].sensor1.channels.channel1.name
In your case you want to be able to iterate over the elements in the structure, which you can do if you access the _fieldnames property of the mat_struct object. From there you can just loop over the field names and access them with getattr:
for field in matfile['sensors']._fieldnames:
# getattr will return the value for the given key
print getattr(matfile['sensors'], field)
This is at least allowing us to access the deeply nested elements without having to alter our mat files.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With