Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to know whether a pointer is in physical memory or it will trigger a Page Fault?

If I have a pointer and I care about memory access performance I may check whether the next operation on it will trigger a page fault. If it will, an algorithm could be designed so that it reorders loop operations to minimize page faults.

Is there any portable (or linux/windows non-portable) way to check for a particular memory address whether access will trigger a page fault?

like image 752
AndresR Avatar asked Jun 11 '16 13:06

AndresR


People also ask

Why do we use pointers in memory?

Most of these abstractions intentionally obscure something central to storage: the address in memory where something is stored. Pointers are a way to get closer to memory and to manipulate the contents of memory directly. In this chapter, we will discuss pointers and how pointers are used to work with memory.

What happens when you send a pointer to a function?

If we send a pointer to memory to a function, any changes to the pointer itself will be ignored, but the function can dereference the pointer and make changes to memory that the pointer references. That memory is not part of the parameter list of the function and those changes will be reflected back to the caller.

What is the purpose of dereferencing a pointer?

This is especially useful when a pointer points to the beginning of an allocated area in memory. Let's say that we have code that just allocated space in memory for 20 integers: By dereferencing the pointer, we gain access to the first integer in the space.

Why can't I free a pointer to a function?

These pointers do not point to anything and trying to free them will cause a program error. Sometimes, memory is allocated by function calls within functions. That kind of allocation is usually freed up by a companion function to the function that allocated the space. (See below in "Pointers and Pebble Programming" for examples.)


3 Answers

About ten years ago, Emery Berger proposed a VM-aware garbage collection strategy which required the application to know which pages were present in memory. For testing purposes, he and his students produced a kernel patch which notified the application of paging events using real-time signals, allowing the garbage collector to maintain its own database of resident pages. (Although that seems like duplication of effort, it is a lot more efficient than multiple system calls in order to obtain information every time it is needed.)

You can find information about this interesting research on his research page.

As far as I know, there is no implementation of this patch for a recent Linux kernel, but it would always be possible to resurrect it.

like image 122
rici Avatar answered Oct 02 '22 12:10

rici


On Linux there is a mechanism, see man proc:

/[pid]/pagemap This file shows the mapping of each of the process's virtual pages into physical page frames or swap area. It contains one 64-bit value for each virtual page, with the bits set as follows:

  • 63 If set, the page is present in RAM.
  • 62 If set, the page is in swap space
  • ...

For example,

$ sudo hexdump -e '/0 "%08_ax "' -e '/8 "%016X" "\n"' /proc/self/pagemap 
00000000 0600000000000000
*
00002000 A6000000000032FE
00002008 A60000000007F3A6
00002010 A600000000094560
00002018 A60000000008D0C0
00002020 A60000000009EBE6
00002028 A6000000000C8E87
like image 27
meuh Avatar answered Oct 02 '22 10:10

meuh


I wrote the page-info library to do this on Linux. It uses the pagemap file under the covers so won't be portable to other OSes.

Some information is restricted to root users, but you should be able to get the information about page presence (whether it is in RAM or not) without being root. Quoting from the README:

So [as a non-root user] you can determine if a page is present, swapped out, its soft-dirty status, whether it is exclusive and whether it is a file mapping, but not much more. On older kernels, you can also get the physical frame number (the pfn) field, which is essentially the physical address of the page (shifted right by 12).

The performance isn't exactly optimized for querying large ranges as it does a separate read for each page, but a PR to improve this would be greatfully accepted.

like image 33
BeeOnRope Avatar answered Oct 02 '22 11:10

BeeOnRope