Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to memory map a huge matrix?

Tags:

c++

mmap

d

Suppose you got a huge (40+ GB) feature value (floating-point) matrix, rows are different features and columns are the samples/images.

The table is precomputed column-wise. Then it is completely accessed row-wise and multi-threaded (each thread loads a whole row) several times.

What would be the best way to handle this matrix? I'm especially pondering over 5 points:

  1. Since it's run on an x64 PC I could memory map the whole matrix at once but would that make sense?
  2. What about the effects of multithreading (multithreaded initial computation as well?)?
  3. How to layout the matrix: row or column major?
  4. Would it help to mark the matrix as read-only after the precomputation has been finished?
  5. Could something like http://www.kernel.org/doc/man-pages/online/pages/man2/madvise.2.html be used to speed it up?
like image 308
Trass3r Avatar asked Jan 29 '11 20:01

Trass3r


People also ask

How is memory mapping done?

The memory mapping process is handled by the virtual memory manager, which is the same subsystem responsible for dealing with the page file. Memory mapped files are loaded into memory one entire page at a time. The page size is selected by the operating system for maximum performance.

What is the memory map good for?

The memory map also ensures that the computer's debuggers can resolve memory addresses to actual stored data. If there were no memory map, or if an existing memory map got corrupted, the OS might (and probably would) write data to, and read data from, the wrong places.

What does a memory map show?

The memory map shows what is included in each memory region. Aside from decoding which memory block or device is accessed, the memory map also defines the memory attributes of the access.

What is Java memory map?

Memory-mapped files are casual special files in Java that help to access content directly from memory. Java Programming supports memory-mapped files with java. nio package. Memory-mapped I/O uses the filesystem to establish a virtual memory mapping from the user directly to the filesystem pages.


1 Answers

Memory mapping the whole file could make the process much easier.

You want to lay out your data to optimize for the most common access pattern. It sounds like the data is going to be written once (column-wise) and read several times (row-wise). That suggests the data should be stored in row-major order.

Marking the matrix read-only once the pre-computation is done probably won't help performance (there are some possible low-level optimizations, but I don't think anything implements them), but it will prevent bugs from accidentally writing to data you don't intend to. Might as well.

madvise could end up being useful, once you've got your application written and working.

My overall advice: write the program in the simplest way you can, sequentially at first, and then put timers around the whole thing and the various major operations. Make sure the major operation times sum to the overall time, so you can be sure you're not missing anything. Then target your performance improvement efforts toward the components that are actually taking the most time.

Per JimR's mention of 4MB pages in his comment, you may end up wanting to look into hugetlbfs or using a Linux Kernel release with transparent huge page support (merged for 2.6.38, could probably be patched into earlier versions). This would likely save you a whole lot of TLB misses, and convince the kernel to do the disk IO in sufficiently large chunks to amortize any seek overhead.

like image 139
Phil Miller Avatar answered Oct 02 '22 17:10

Phil Miller