Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limit buffer cache used for mmap

Tags:

c++

memory

mmap

I have a data structure that I'd like to rework to page out on-demand. mmap seems like an easy way to run some initial experiments. However, I want to limit the amount of buffer cache that the mmap uses. The machine has enough memory to page the entire data structure into cache, but for test reasons (and some production reasons too) I don't want to allow it to do that.

Is there a way to limit the amount of buffer cache used by mmap?

Alternatively, an mmap alternative that can achieve something similar and still limit memory usage would work too.

like image 596
JaredC Avatar asked Jul 25 '17 18:07

JaredC


People also ask

What is buffer cache used for?

In SQL Server, the buffer cache is the memory that allows you to query frequently accessed data quickly. When data is written to or read from a SQL Server database, the buffer manager copies it into the buffer cache (aka the buffer pool).

Is mmap cached?

mmap takes advantage of file system caches by asking the operating system to map the needed files in virtual memory in order to access that memory directly. Mmap allows you to read files that are much bigger than the physical memory available to the system. It is achieved using virtual memory.

Does mmap consume RAM?

In computing, mmap(2) is a POSIX-compliant Unix system call that maps files or devices into memory. It is a method of memory-mapped file I/O. It implements demand paging because file contents are not immediately read from disk and initially use no physical RAM at all.

How do I limit buffer cache in Linux?

Another way to limit the amount of memory that buffer caches use is via control groups (also referred to as cgroups). cgroups provide a way to group one or more processes to limit the resources they use. We can therefore leverage that feature to set an upper limit on how much memory a process can get.


2 Answers

From my understanding, it is not possible. Memory mapping is controlled by the operating system. The kernel will make the decisions how to use the available memory in the best way, but it looks at the system in total. I'm not aware that quotas for caches on a process level are supported (at least, I have not seen such APIs in Linux or BSD).

There is madvise to give the kernel hints, but it does not support to limit the cache used for one process. You can give it hints like MADV_DONTNEED, which will reduce the pressure on the cache of other applications, but I would expect that it will do more harm than good, as it will most likely make caching less efficient, which will lead to more IO load on the system in total.

I see only two alternatives. One is trying to solve the problem at the OS level, and the other is to solve it at the application level.

At the OS level, I see two options:

  1. You could run a virtual machine, but most likely this is not what you want. I would also expect that it will not improve the overall system performance. Still, it would be at least a way to define upper limits on the memory consumption.
  2. Docker is the another idea that comes to mind, also operating at the OS level, but to the best of my knowledge, it does not support defining cache quotas. I don't think it will work.

That leaves only one option, which is to look at the application level. Instead of using memory mapped files, you could use explicit file system operations. If you need to have full control over the buffer, I think it is the only practical option. It is more work than memory mapping, and it is also not guaranteed to perform better.

If you want to stay with memory mapping, you could also map only parts of the file in memory and unmap other parts when you exceed your memory quota. It also has the same problem as the explicit file IO operations (more implementation work and non-trivial tuning to find a good caching strategy).

Having said that, you could question the requirement to limit the cache memory usage. I would expect that the kernel does a pretty good job at allocating memory resources in a good way. At least, it will likely be better than the solutions that I have sketched. (Explicit file IO, plus an internal cache, might be fast, but it is not trivial to implement and tune. Here, is a comparison of the trade-offs: mmap() vs. reading blocks.)

During testing, you could run the application with ionice -c 3 and nice -n 20 to somewhat reduce the impact on the other productive applications. There is also a tool called nocache. I never used it but when reading through its documentation, it seems somewhat related to your question.

like image 170
Philipp Claßen Avatar answered Oct 17 '22 20:10

Philipp Claßen


It might be possible to accomplish this through the use of mmap() and Linux Control Groups (more generally, here or here). Once installed, you have the ability to create arbitrary limits on the amount of, among other things, physical memory used by a process. As an example, here we limit the physical memory to 128 megs and swap memory to 256 megs:

cgcreate -g memory:/limitMemory
echo $(( 128 * 1024 * 1024 )) > /sys/fs/cgroup/memory/limitMemory/memory.limit_in_bytes
echo $(( 256 * 1024 * 1024 )) > /sys/fs/cgroup/memory/limitMemory/memory.memsw.limit_in_bytes
like image 41
David Hoelzer Avatar answered Oct 17 '22 19:10

David Hoelzer