I have a vast quantity of data (>800Mb) that takes an age to load into Matlab mainly because it's split up into tiny files each <20kB. They are all in a proprietary format which I can read and load into Matlab, its just that it takes so long.
I am thinking of reading the data in and writing it out to some sort of binary file which should make it quicker for subsequent reads (of which there may be many, hence me needing a speed-up).
So, my question is, what would be the best format to write them to disk to make reading them back again as quick as possible?
I guess I have the option of writing using fwrite, or just saving the variables from matlab. I think I'd prefer the fwrite option so if needed, I could read them from another package/language...
Look in to the HDF5 data format, used by recent versions of MATLAB as the underlying format for .mat files. You can manually create your own HDF5 files using the hdf5write
function, and this file can be accessed from any language that has HDF bindings (most common languages do, or at least offer a way to integrate C code that can call the HDF5 library).
If your data is numeric (and of the same datatype), you might find it hard to beat the performance of plain binary (fwrite).
Binary mat-files are the fastest. Just use
save myfile.mat <var_a> <var_b> ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With