I am an operating system hobbyist, and my kernel runs on 80486+, and already supports virtual memory.
Starting from 80386, the x86 processor family by Intel and various clones thereof has supported virtual memory with paging. It is well known that when the PG
bit in CR0
is set, the processor uses virtual address translation. Then, the CR3
register points to the top-level page directory, that is the root for 2-4 levels of page table structures that map the virtual addresses to physical addresses.
The processor does not consult these tables for each virtual address generated, instead caching them in a structure called Translation Lookaside Buffer, or TLB. However, when changes to the page tables are made, the TLB needs to be flushed. On 80386 processors, this flush would be done by reloading (MOV
) CR3
with the top level page directory address, or a task switch. This supposedly unconditionally flushes all the TLB entries. As I understand, it would be perfectly valid for a virtual memory system to always reload CR3 after any change.
This is wasteful, since the TLB would now throw out completely good entries, thus in 80486 processors the INVLPG
instruction was introduced. INVLPG
will invalidate the TLB entry matching the source operand address.
Yet starting with Pentium Pro, we also have global pages that are not flushed with the moves to CR3
or task switch; and AMD x86-64 ISA says that some upper level page table structures might be cached and not invalidated by INVLPG
. To get a coherent picture of what is needed and what is not needed on each ISA one would really need to download a 1000-page datasheet for a multitudes of ISAs released since 80s to read a couple pages therein, and even then the documents seem to be particularly vague as to the TLB invalidation and what happens if the TLB is not properly invalidated.
For the simplicity, one can assume that we are talking about a uniprocessor system. Also, it can be assumed that no task-switch is required after changing the page structures. (thus INVLPG
is always supposedly at least as good choice as reloading the CR3
register).
The base assumption is that one would need to reload CR3
after each change to page tables and page directories, and such a system would be correct. However, if one wants to avoid flushing the TLB needlessly, one needs answers to the 2 questions:
Provided that INVLPG
is supported on the ISA, after what kind of changes can one safely use it instead of reloading the CR3
? E.g. "If one unmaps one page frame (set the corresponding table entry to not present), one can always use INVLPG
"?
What kind of changes one can do to the tables and directories without touching either CR3
or executing INVLPG
? E.g. "If a page is not mapped at all (not present), one can write a PTE with Present=1
for it without flushing the TLB at all"?
Even after reading a quite a load of ISA documents and everything related to INVLPG
here on Stack Overflow I am not personally sure of either examples I presented there. Indeed, one notable post stated it right away: "I don't know exactly when you should use it and when you shouldn't." Thus any certain, correct examples, preferably documented, and for either IA32 or x86-64, that you can give, are appreciated.
Flushing of the TLB can be an important security mechanism for memory isolation between processes to ensure a process can't access data stored in memory pages of another process.
The INVLPG instruction is a privileged instruction. When the processor is running in protected mode, the CPL must be 0 to execute this instruction. The INVLPG instruction normally flushes TLB entries only for the specified page; however, in some cases, it may flush more entries, even the entire TLB.
Thus, if a page is moved in physical memory while it has an entry in the TLB, that TLB entry becomes invalid and must no longer be used. (Preventing the movement of all pages that could be shared with the accelerator is prohibitively expensive.) This invalidation of a TLB entry is called TLB invalidation.
In the simplest possible terms; the requirement is that anything the CPU's TLB could have remembered that has changed has to be invalidated before anything that relies on the change happens.
The things that the CPU's could have remembered include:
WARNING: Because Intel CPUs don't remember "not present" pages, documentation from Intel may say that you don't need to invalidate when changing a page from "not present" to "present". Intel's documentation is only correct for Intel CPUs. It is not correct for all 80x86 CPUs. Some CPUs (mostly Cyrix) do remember when a page was "not present" and because of those CPUs you do have to invalidate when changing a page from "not present" to "present".
Note that due to speculative execution you can not cut corners. For example, if you know a page has never been accessed you can't assume it's not in the TLB because the TLB may have been speculatively fetched.
I have chosen the words "before anything that relies on the change happens" very carefully. Modern CPUs (especially for long mode) do cache the higher level paging structures (e.g. PDPT entries) and not just the final pages. This means that if you change a higher level paging structure but the page table entries themselves remain the same, you still need to invalidate.
It also means that it is possible to skip the invalidation if nothing relies on the change. A simple example of this is with the accessed and dirty flags - if you aren't relying on these flags (to determine "least recently used" and which pages to send to swap space) then it doesn't matter much if the CPU doesn't realise that you've change them. It is also possible (not recommended for single-CPU but very recommended for multi-CPU) to skip the TLB invalidation in cases where you'd get a page fault if the CPU is using the old/stale TLB information, where the page fault handler invalidates if and only if it's actually necessary.
In addition; "anything the CPU's TLB could have remembered" is a little tricky. Often an OS will map the paging structures themselves into the virtual address space to allow fast/easy access to them (e.g. the common "recursive mapping" trick where you pretend the page directory is a page table). In this case when you change a page directory entry you need to invalidate the effected normal pages (as you'd expect) but you also need to invalidate anything the change effected in any mappings.
For which to use (INVLPG or reloading CR3) there are several issues. For a single page INVLPG will be faster. If you change a page directory (effecting 1024 pages or 512 pages, depending on which flavour of paging) then using INVLPG in a loop may or may not be more expensive that just reloading CR3 (it depends on CPU/hardware, and the access patterns for the code following the invalidation).
There are 2 more issues that come into this. The first is task switching. When switching between tasks that use different virtual address spaces you have to change CR3. This means that if you change something that effects a large area (e.g. a page directory) you can improve overall performance by doing a task switch early, rather than reloading CR3 now (for invalidation) and then reloading CR3 soon after (for the task switch). Basically, it's a "kill 2 birds with one stone" optimisation.
The other thing is "global pages". Typically there's pages that are the same in all virtual address spaces (e.g. the kernel). When you reload CR3 (e.g. during a task switch) you don't want TLBs for the pages that remain the same to be invalidated for no reason, because that would hurt performance more than necessary. To fix that and improve performance, (for Pentium and later) there's a feature called "global pages" where you get to mark these common pages as global and they are not invalidated when you reload CR3. In that case, if you need to invalidate global pages you need to use either INVPLG or change CR4 (e.g. disable and then reenable the global pages feature). For larger areas (e.g. changing a page directory and not just one page) it's the same as before (messing with CR4 may be faster or slower than INVLPG in a loop).
To your first question:
INVLPG
and you can do any change possible. Use of INVLPG
is always safe.CR3
does not invalidate global pages in the TLB. So sometimes you must use INVLPG
as reloading CR3
has no effect.INVLPG
must be used for every page involved. If you are changing multiple pages at a time then there comes a point where reloading CR3
is faster than a multitude of INVLPG
calls.To your second question:
A page that is not mapped can not be cached in the TLB (assuming you invalidated it correctly when you unmapped it previously). So any change from not-present does not need INVLPG
or CR3
reloading.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With