Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load large .mat files in python?

I have very large .mat file (~ 1.3 GB) that I am trying to load in my Python code (IPython notebook). I tried:

import scipy.io as sio
very_large = sio.loadmat('very_large.mat')

And my laptop with 8 GB RAM hangs. I kept system monitor open and saw that the memory consumption steadily increases to 7 GB and then the system freezes.

What am I doing wrong? Any suggestion / work around?

EDIT:

More details on the data: Here is the link to the data: http://ufldl.stanford.edu/housenumbers/

The particular file of my interest is extra_32x32.mat. From the description : Loading the .mat files creates 2 variables: X which is a 4-D matrix containing the images, and y which is a vector of class labels. To access the images, X(:,:,:,i) gives the i-th 32-by-32 RGB image, with class label y(i).

So for example a smaller .mat file from the same page (test_32x32.mat) when loaded in the following way:

SVHN_full_test_data = sio.loadmat('test_32x32.mat')
print("\nData set = SVHN_full_test_data")
for key, value in SVHN_full_test_data.iteritems():
    print("Type of", key, ":", type(SVHN_full_test_data[key]))
if str(type(SVHN_full_test_data[key])) == "<type 'numpy.ndarray'>":
    print("Shape of", key, ":", SVHN_full_test_data[key].shape)
else:
    print("Content:", SVHN_full_test_data[key])

produces:

Data set = SVHN_full_test_data
Type of y : <type 'numpy.ndarray'>
Shape of y : (26032, 1)
Type of X : <type 'numpy.ndarray'>
Shape of X : (32, 32, 3, 26032)
Type of __version__ : <type 'str'>
Content: 1.0
Type of __header__ : <type 'str'>
Content: MATLAB 5.0 MAT-file, Platform: GLNXA64, Created on: Mon Dec  5 21:18:15 2011
Type of __globals__ : <type 'list'>
Content: []
like image 927
user42388 Avatar asked Aug 25 '16 19:08

user42388


People also ask

How do I open a large .MAT file?

Use the matfile function to access MATLAB® variables directly from MAT-files on disk, without loading the full variables into memory. When you create a new file using matfile , the function creates a Version 7.3 MAT-file that also lets you save variables larger than 2 GB.

Can Python open .MAT files?

Load data from a MAT-fileThe function loadmat loads all variables stored in the MAT-file into a simple Python data structure, using only Python's dict and list objects. Numeric and cell arrays are converted to row-ordered nested lists.


1 Answers

This answer is dependent on two assumptions:

  • The .mat file is saved as MAT version 7.3 (which seems hdf5-compliant, although The MathWorks don't go as far as guaranteeing it), or could be saved via direct write to hdf5 format (with MATLAB's hdfwrite()).

  • You're able to import and use other third party packages in python, namely pandas.

Approach

Given those assumptions, the approach I'd use is:

  1. Ensure the .mat file is saved to an hdf5 compatible form. This might mean converting it using MATLAB's matfile(), which won't load it all to disk, or could be done one-time on a machine with more RAM.

  2. Use pandas to read part of the hdf5-compliant .mat file into a data frame.

  3. Use the data frame for your onward analysis in python.

Notes:

Pandas data frames work very well with numpy and scipy in general. So if you can read your data into a frame, you'll probably be able to do what you want with it from there.

The answer to this SO question shows you how to read only part of an hdf5 datafile into memory (a pandas data frame) at a time, based on a condition (index range, or some logical condition e.g. WHERE something=somethingelse).

Mini-rant

MATLAB has supported its latest version 7.3 MAT files for 12 years now, but still doesn't use that as the standard version to save to (it's a disk space thing, v7.3 are larger in some situations but way more versatile to use) - so anyone using default MATLAB settings won't be generating v7.3 matfiles. 12 years on, we've loads of disk space but this kind of thing still causes problems. It's time to upgrade your default flag, MathWorks!!!!

Hope that helps,

Tom

like image 169
thclark Avatar answered Oct 12 '22 23:10

thclark