Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MATLAB: How to create multiple mapped memory files with a simple "iterator"?

I have files (>100) that each contain recorded sets of data like this:

  • file0: [no. of data sets in file, no. of data points for recording1, related data to recording1, no. of data points for recording2, related data to recording2, ... , no. of data points for recordingM, related data to recordingM]
  • file1: [no. of data sets in file, ...] (same as above)

All of the data together may exceed 20 GB, so loading all of it into memory is not an option. Hence, I would like to create memory-mapped files for each of the files BUT hiding from the "user" the complexity of the underlying data, e.g., I would like to be able to operate on the data like this:

for i=1:TotalNumberOfRecordings
    recording(i) = recording(i) * 10;        % some stupid data operation
                                             % or even more advanced better:
    recording(i).relatedData = 2000;
end

So, no matter if recording(i) is in file0, file1, or some other file, and no matter its position within the file, I have a list that allows to me access the related data via a memory map.

What I have so far, is a list of all files within a certain directory, my idea now was to simply create a list like this:

entry1: [memoryMappedFileHandle, dataRangeOfRecording]
entry2: [memoryMappedFileHandle, dataRangeOfRecording]

And then use this list to further abstract files and recordings. I started with this code:

fileList = getAllFiles(directoryName);
list = []; n = 0;
for file = 1:length(fileList);
   m = memmapfile(fileList(file));
   for numberOfTracesInFile
       n = n+1;
       list = [list; [n, m]];
   end
end

But I do get the error:

Memmapfile objects cannot be concatenated

I'm quite new to MATLAB so this is probably a bad idea after all. How to do it better? Is it possible to create a memorymapped table that contains multiple files?

like image 874
user26372 Avatar asked Jan 10 '15 17:01

user26372


1 Answers

I'm not sure whether the core of your question is specifically about memory-mapped files, or about whether there is a way to seamlessly process data from multiple large files without the user needing to bother with the details of where the data is.

To address the second question, MATLAB 2014b introduced a new datastore object that is designed to do pretty much this. Essentially, you create a datastore object that refers to your files, and you can then pull data from the datastore without needing to worry about which file it's in. datastore is also designed to work very closely with the new mapreduce functionality that was introduced at the same time, which allows you to easily parallelize map-reduce programming patterns, and even tie in with Hadoop.

To answer the first question - I'm afraid I think you've found your answer, which is that memmapfile objects can not be concatenated, so no, not straightforward. I think your best approach would be to build your own class, which would contain multiple memmapfile objects in a cell array along with information about which data was in which file, along with some sort of getData method that would retrieve the appropriate data from the appropriate file. (This would be basically like writing your own datastore class, but which worked with memory-mapped files rather than files, so you might be able to copy much of the design and/or implementation details from datastore itself).

like image 193
Sam Roberts Avatar answered Sep 24 '22 18:09

Sam Roberts