Let's say I generate the following toy dataset from Matlab, and I save it as a mat file:
>> arr = rand(100);
>> whos arr
Name Size Bytes Class Attributes
arr 100x100 80000 double
>> save('arr.mat', 'arr')
The saved arr.mat
file is of size 75829 Bytes
according to the output of the ls
command.
If I load the same file using scipy.io.loadmat()
and save it again using scipy.io.savemat()
:
arr = io.loadmat('arr.mat')
with open('arrscipy.mat', 'w') as f:
io.savemat(f, arr)
I obtain a file with a considerably different size (∼ 4KB larger):
$ ls -al
75829 Nov 6 11:52 arr.mat
80184 Nov 6 11:52 arrscipy.mat
I now have two binary mat files containing the same data. My understanding is that the size of a binary mat file is determined by the size of its contained variables, plus some overhead due to file headers. However the sizes of these two files are considerably different. Why is this? Is it a data format problem?
I tried this with arrays of structures too, and the result is similar: scipy-saved mat files are larger than Matlab-saved ones.
Look at the docs:
scipy.io.savemat(file_name, mdict, appendmat=True, format='5',
long_field_names=False, do_compression=False, oned_as='row')
Compression is turned off by default. In matlab compression is always turned on.
There's a catch when you set do_compression=True. For large files, MATLAB cannot load when saved with do_compression=True.
In my case, mat files under 2 GB didn't have any problem loading from my MATLAB (2017b) whether do_compression is True or False, but when I load 2.25 GB mat file saved using scipy.io.savemat() with compression, MATLAB failed to load even though I can load it from Python using loadmat().
In scipy.io.savemat manual, the default value of format = '5', which supports up to MATLAB 7.2. It is the latest version it supports. In MATLAB's save() documentation, however, it says it needs to be saved with '-v7.3' for files over 2GB. I think the reason scipy's savemat fails to save correctly is because it doesn't support MATLAB 7.3 version for mat files larger than 2GB.
Hopefully scipy will have an upgrade to fix this problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With