Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory map file in MATLAB?

Tags:

matlab

bigdata

I have decided to use memmapfile because my data (typically 30Gb to 60Gb) is too big to fit in a computer's memory.

My data files consist two columns of data that correspond to the outputs of two sensors and I have them in both .bin and .txt formats.

m=memmapfile('G:\E-Stress Research\Data\2013-12-18\LD101_3\EPS/LD101_3.bin','format','int32')
m.data(1)

I used the above code to memory map my data to a variable "m" but I have no idea what data format to use (int8', 'int16', 'int32', 'int64','uint8', 'uint16', 'uint32', 'uint64', 'single', and 'double'). In fact I tried all of the data formats listed that MATLAB supports, but when I used the m.data(index number) I never get a pair of numbers (2 columns of data) which is what I expected, also the number will be different depending on the format I used.

If anyone has experience with memmapfile please help me.

Here are some smaller versions of my data files so people can understand how my data is structured:

cheers James

like image 341
James Archer Avatar asked Jan 06 '14 15:01

James Archer


People also ask

What is memory-mapping in MATLAB?

Memory-mapping is a mechanism that maps a portion of a file, or an entire file, on disk to a range of memory addresses within the MATLAB ® address space. Then, MATLAB can access files on disk in the same way it accesses dynamic memory, accelerating file reading and writing.

How does MATLAB access files on a disk?

Then, MATLAB can access files on disk in the same way it accesses dynamic memory, accelerating file reading and writing. Memory-mapping allows you to work with data in a file as if it were a MATLAB array.

What is memory mapping in Linux?

Memory-mapping is a mechanism that maps a portion of a file, or an entire file, on disk to a range of addresses within an application's address space. Suppose you want to create a memory map for a file named records.dat , using the memmapfile function.

What does Memm do in MATLAB?

m = memmapfile (filename) maps an existing file, filename, to memory and returns the memory map, m. Memory-mapping is a mechanism that maps a portion of a file, or an entire file, on disk to a range of memory addresses within the MATLAB ® address space.


1 Answers

memmapfile is designed for reading binary files, that's why you are having trouble with your text file. The data in there is characters, so you'll have to read them as characters and then parse them into numbers. More on that below.

The binary file appears to contain more than just a stream of floating point values written in binary format. I see identifiers (strings) and other things in the file as well. Your only hope of reading that is to contact the manufacturer of the device that created the binary file and ask them about how to read in such files. There'll probably be an SDK, or at least a description of the format. You might want to look into this as the floating point numbers in your text file might be truncated, i.e., you have lost precision compared to directly reading the binary representation of the floats.

Ok, so how to read your file with memmapfile? This post provides some hints.

So first we open your file as 'uint8' (note there is no 'char' option, so as a workaround we read the content of the file into a datatype of the same size):

m = memmapfile('RTL5_57.txt','Format','uint8'); % uint8 is default, you could leave that off

We can render the data read in as uint8 as characters by casting it to char:

c = char(m.Data(1:19)).' % read the first three lines. NB: transpose just for getting nice output, don't use it in your code
c = 
    0.398516    0.063440
    0.399611    0.063284
    0.398985    0.061253

As each line in your file has the same length (2*8 chars for the numbers, 1 tab and 2 chars for newline = 19 chars), we can read N lines from the file by reading N*19 values. So m.Data(1:19) gets you the first line, m.Data(20:38), the second line, and m.Data(20:57) the second and third lines. Read as much as you want at once.

Then we'll have to parse the read-in data into floating point numbers:

f = sscanf(c,'%f')
f =
    0.3985
    0.0634
    0.3996
    0.0633
    0.3990
    0.0613

All that's left now is to reshape them into your two column format

d = reshape(f,2,[]).'
d =
    0.3985    0.0634
    0.3996    0.0633
    0.3990    0.0613

Easier ways than using memmapfile: You don't need to use memmapfile to solve your problem, and I think it makes things more complicated. You can simply use fopen followed by fread:

fid = fopen('RTL5_57.txt');
c = fread(fid,Nlines*19,'*char');
% now sscanf and reshape as above
% NB: one can read the values the text file directly with f = fscanf(fid,'%f',Nlines*19).
% However, in testing, I have found calling fread followed by sscanf to be faster
% which will make a significant difference when reading such large files.

Using this you can read Nlines pairs of values at a time, process them and simply call fread again to read the next Nlines. fread remembers where it is in the file (as does fscanf), so simply use same call to get next lines. Its thus easy to write a loop to process the whole file, testing with feof(fid) if you are at the end of the file.

An even easier way is suggested here: use textscan. To slightly adapt their example code:

Nlines = 10000;

% describe the format of the data
% for more information, see the textscan reference page
format = '%f\t%f';

fid = fopen('RTL5_57.txt');

while ~feof(fid)
   C = textscan(fid, format, Nlines, 'CollectOutput', true);
   d = C{1};  % immediately clear C at this point if you need the memory! 
   % process d
end

fclose(fid);

Note again however that the fread followed by sscanf will be fastest. Note however that the fread method would die as soon as there is one line in the text file that doesn't exactly match your format. textscan is forgiving of whitespace changes on the other hand and thus more robust.

like image 153
Diederick C. Niehorster Avatar answered Oct 17 '22 03:10

Diederick C. Niehorster