Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cost of a page fault trap

I have an application which periodically (after each 1 or 2 seconds) takes checkpoints by forking itself. So checkpoint is a fork of the original process which just stays idle until it is asked to start when some error in the original process occurs.

Now my question is how costly is the copy-on-write mechanism of fork. How much is the cost of a page fault trap that will occur whenever the original process writes to a memory page (first time after taking a checkpoint that is), as copy-on-write mechanism will make sure that it gives the original process a different physical page than the checkpoint.

In my opinion, the page fault trap overhead could be quite high as an interrupt occurs, we land from user-space land to the kernel space land and then back from kernel to user-space. How many CPU cycles can I lose from such a a page fault trap. Assume that the RAM is big enough and we don't ever need to swap to the hard disk.

Well I know that its difficult to imagine a checkpointing scheme more efficient than this and therefore you could say why I'm worrying about page trap fault overhead, but I'm asking just to have an idea how much cost will be there for this scheme.

like image 236
pythonic Avatar asked Apr 19 '12 07:04

pythonic


People also ask

What is a page fault trap?

A page fault trap occurs if the requested page is not loaded into memory. The page fault primarily causes an exception, which is used to notify the operating system to retrieve the "pages" from virtual memory to continue operation.

How do you fix a page fault?

Once virtual address caused page fault is known, system checks to see if address is valid and checks if there is no protection access problem. If the virtual address is valid, the system checks to see if a page frame is free. If no frames are free, the page replacement algorithm is run to remove a page.

What is page fault rate?

Now, a page fault will occur for every certain amount of main memory access, which is called page fault rate. For example, if one page fault has occurred for every memory access 10 5 instruction i.e. one-page fault occur after every 10 5 instruction in memory then.

Is a page fault a trap or interrupt?

A page fault occurs when the CPU generates a logical address for a page that is not in physical memory. The MMU will cause a page-fault trap (interrupt) to the OS.


1 Answers

You can do the rough math for an educated guess yourself. Assuming no disk access (~10 billion cycles), you have to account for

  • 160 cycles for the trap and returning (approximately, on x86_64)
  • validity checks, quota, accounting, and whatnot (unknown, probably a few hundred to a thousand cycles)
  • aligned memcpy of 4096 bytes, something around 500-800 cycles
  • TLB invalidation (adds 10-100 cycles on first access)
  • either eviction of other cached data or one guaranteed cache miss (80-400 cycles) depending on the implementation of the memcpy. It matters a lot on your access pattern whether one or the other is better.

So all in all, we're talking of something around 2000 cycles, with some of the effects (e.g. TLB and cache effects) being spread out and not immediately visible. Omondi and Sedukhin reported 1700 cycles on P-III back in 2003, which is consistent with this estimate.

Note that if the page has never been written to before, things are slightly different according to a comment by L. Torvalds back in 2000. A copy-on-write miss on a zero page pulls another zero page from the pool and doesn't copy zeroes. That's pretty much a guaranteed cache miss too, though.

like image 107
Damon Avatar answered Sep 28 '22 23:09

Damon