Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alternatives to Matlab's Mat File Format

Tags:

matlab

I'm finding that writing and reading the native mat file format becomes very, very slow with larger data structures of about 1G in size. In addition we have other, non-matlab, software that should be able to read and write these files. So I would to find an alternative format to use to serialize matlab data structures. Ideally this format would ...

  1. be able to represent an arbitrary matlab structure to a file.
  2. have faster I/O than than mat files.
  3. have I/O libraries for other languages like Java, Python and C++.
like image 756
Sean McCauliff Avatar asked Sep 26 '12 00:09

Sean McCauliff


People also ask

Is a mat file HDF5?

Version 7.3 MAT-files use an HDF5 based format that requires some overhead storage to describe the contents of the file. For cell arrays, structure arrays, or other containers that can store heterogeneous data types, Version 7.3 MAT-files are sometimes larger than Version 7 MAT-files.

How do I open a .MAT file without MATLAB?

mat-file is a compressed binary file. It is not possible to open it with a text editor (except you have a special plugin as Dennis Jaheruddin says). Otherwise you will have to convert it into a text file (csv for example) with a script. This could be done by python for example: Read .

What program opens .MAT files?

Mathworks MATLAB is the software used to open MAT files. It is an application used to develop algorithm, visualize and analyze data as well as to compute numbers.


1 Answers

Simplifying your data structures and using the new v7.3 MAT file format, which is a variant of HDF5, might actually be the best approach. The HDF5 format is open and already has I/O libraries for your other languages. And depending on your data structure, they may be faster than the old binary mat files.

  • Simplify the data structures you're saving, preferring large arrays of primitives to complex container structures.
  • Try turning off compression if your data structures are still complex.
  • Try the v7.3 MAT file format using "-v7.3"
  • If using a network file system, consider saving and loading to a temporary dir on a fast local drive and copying to/from the network

For large data structures, your MAT file I/O speed may be determined more by the internal structure of the data you're writing out than the size of the resulting MAT file itself. (In my experience, this has usually been the major factor in slow MAT files.) When you say "arbitrary Matlab structure", that suggests you might be using cells, structs, or objects to make complex data structures. That slows down MAT I/O because there is per-array overhead in MAT file I/O, and the members of cell and struct arrays (container types) all count as separate arrays. For example, 5,000 strings stored in a cellstr are much, much slower than the same 5,000 strings stored in a 2-D char array. And objects have even more overhead. As a test, try writing out a 1 GB file that contains just a 1 GB primitive array of random uint8s, and see how long that takes. From there, see if you can simplify your data to reduce the total mxarray count, even if that means reshaping it for serialization. (My experience with this is mostly with the v7 format; the newer HDF5 format may have less per element overhead.)

If your data files live on the network, you could also try doing the save and load operations on temporary files on fast local drives, and separately using copy operations to move them back and forth between the network. At least on Windows networks, I've seen speedups of up to 2x from doing this. Possibly due to optimizations the full-file copy operation can do that the MAT I/O code can't.

It would probably be a substantial effort to come up with an alternate file format that supported fully arbitrary Matlab data structures and was portable to other languages. I'd try making smaller changes around your use of the existing format first.

like image 94
Andrew Janke Avatar answered Sep 17 '22 23:09

Andrew Janke