Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does x86 paging have no concept of privilege rings?

Back in 1982, when Intel released the 80286, they added 4 privilege levels to the segmentation scheme (rings 0-3), specified by 2 bits in the Global Descriptor Table (GDT) and Local Descriptor Table (LDT).

In the 80386 processor, Intel added paging, but surprisingly, it only has 2 privilege levels (supervisor and user), specified by a single bit in the Page Directory Entry (PDE) and Page Table Entry (PTE).

This means that an OS that only uses paging (like most modern OSes) is unable to benefit from the existence of rings 1 and 2, which could be very useful, for example, for drivers. (Win9x, for example, frequently crashed because it was loading buggy unchecked drivers into ring 0).

From the POV of portability, the existence of rings 1 and 2 is a quirk of the x86 architecture and portable OSes shouldn't use them, because other architectures only have 2 privilege levels.

But I am sure that portability to other platforms is not what Intel engineers were thinking back in 1985 when they were designing the 386.

So why didn't Intel allow paging to have 4 privilege levels, like segmentation?

like image 951
DarkAtom Avatar asked Feb 04 '21 20:02

DarkAtom


People also ask

How many privilege rings are there in Intel x86?

The x86-processors have four different modes divided into four different rings. Programs that run in Ring 0 can do anything with the system, and code that runs in Ring 3 should be able to fail at any time without impact to the rest of the computer system.

Which of the following level of privilege is x86?

In a x86 computer there are 4 privilege levels, though only two levels are typically used, level or ring 0 for OS/hypervisor and level 3 for user space programs. When a program runs on the CPU, two bits in a register called the code selector (CS) register indicate the current privilege level or CPL of that program.


3 Answers

One guess that occurs to me is that Intel intended that when Ring 1 code is running, it is the supervisor, "supervising" ring 3 code. Not ring 1 running under ring 0.

If the ring 1 code wants to call ring 0 code, it can call through a call-gate, and the ring 0 code can change CR3 to a page table that includes mappings for physical pages that weren't present in the page table the ring 1 or 2 code was using.

I really don't know a lot about this stuff, but https://wiki.osdev.org/Task_State_Segment shows that the TSS includes a CR3 field, so using hardware task-switching I'm guessing that calling through a call-gate can trigger the CR3 change directly. (So the call target does not already have to be mapped, otherwise ring 1 / 2 code could have modified it. Or it could be mapped read-only, along with the page table itself and the GDT, to stop the ring 1 code from taking over ring 0 by modifying it.)

This means that an OS that only uses paging [...] unable to benefit from the existence of rings 1 and 2

That's your mistake: you can't "only use paging". Even making interrupt handling from user-space work on a normal x86 OS (with a flat memory model) requires setting up TSS stuff to set ESP to the kernel stack pointer when switching to kernel mode, even if you don't otherwise use hardware task-switching.

x86 has "task gates" and "call gates" and all kinds of really complex stuff I hope I don't ever have to fully understand, but I expect that spending some time reading up on it might shed some light on the kind of things the architects of 386 thought OSes might want to do.

Separate from my previous guess (about ring 1 supervising ring 3), perhaps Intel expected OSes to use segmentation to separate ring 1 / 2 from ring 0 memory in the same page table if desired1. As you say, they probably weren't trying to create something that portable microkernel OSes could just use as a bonus.

A kernel has the luxury of deciding the layout of virtual address space, so it could well assign chunks of that for use by ring 1 code, setting up CS/DS/ES/SS appropriately when calling it.

I think that would have to mean a non-flat model, though, because x86 segmentation makes addresses go from 0..limit, not e.g. allowing access to a range of virtual addresses from low..high without changing the meaning of a pointer.

Footnote 1:

Is it necessary to have full memory protection between ring 0 and ring 1? An OS might use ring 1 for semi-trusted code.

Some privileged instructions require ring 0 so ring 1 would stop that from happening by accident. IO privilege level can be set separately to allow cli and in/out in ring > 0, but other instructions like invlpg, lgdt, and mov cr, reg require actual ring 0.

like image 97
Peter Cordes Avatar answered Nov 15 '22 09:11

Peter Cordes


There are four privilege levels (called rings) in 386 protected mode as well as in 286: ring 0 has the highest privilege (operating system), rings 1 and 2 are not widely used, and ring 3 has the lowest privilege (user application). Rings 0-2 are called "Supervisor", while ring 3 is called "User".

The current privilege level (CPL) is determined by the address of the instruction you are executing, according to the Descriptor Privilege Level (DPL) of the code segment. For more information about the current privilege level, see CPL vs. DPL vs. RPL.

The bit that you are referring to is the following. It is a bit 2 in a 32-bit Page-Directory Entry (PDE) that maps a 4MB page (or of a 32-bit PDE that references a page table). This bit is called "User/Supervisor" (U/S). The value of "0" in this bit means that the user-mode accesses are not allowed to the 4MB region controlled by this entry. This does not mean that there are, as you wrote, just "2 privilege levels (supervisor and user)". The "supervisor" level still consists of three rings. This comprises, together with the user ring, four rings in total.

See section 4.6 of Intel 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A: System Programming Guide, Part 1:

Every access to a linear address is either a supervisor-mode access or a user-mode access. For all instruction fetches and most data accesses, this distinction is determined by the current privilege level (CPL): accesses made while CPL < 3 are supervisor-mode accesses, while accesses made while CPL = 3 are user-mode accesses.

Therefore, CPL can be 0, 1, 2 and 3, effectively having all 4 rings.

Please find more information on the U/S flag from the manual above mentioned:

Some operations implicitly access system data structures with linear addresses [...] called implicit supervisor-mode accesses regardless of CPL. Other accesses made while CPL < 3 are called explicit supervisor-mode accesses. Access rights are also controlled by the mode of a linear address as specified by the paging-structure entries controlling the translation of the linear address. If the U/S flag (bit 2) is 0 in at least one of the paging-structure entries, the address is a supervisor-mode address. Otherwise, the address is a user-mode address.

P.S. My answer does not address the issue why there isn't the same memory protection between ring 1 and ring 0 as it is between ring 3 and rings 0/1/2, so the rings 1 and 2 are unusable if a page-table entry can't distinguish them from ring 0. See the reply by Peter Cordes that addresses this issue.

like image 37
Maxim Masiutin Avatar answered Nov 15 '22 08:11

Maxim Masiutin


The desire is to protect stuff from other stuff. Before paging existed (and before 80x86 existed - the "4 rings" model dates back to Multics if not earlier) the easiest way was to use "rings".

With 4 rings you can have a "D can't access C, and they can't access B, and they all can't access A" arrangement. This is relatively awful for the opposite direction ("C can access everything in D regardless of whether it needs to or not") and relatively awful for granularity (e.g. if you want "C can access part of D but not all of D").

With paging, you can give each thing its own virtual address space and map anything anywhere to control access (as you can't access anything that isn't mapped into your virtual address space). You can still have "D can't access C, and they can't access B, and they all can't access A" (if that's what you actually want) just by mapping all pages belonging to D into A, B and C; and mapping all pages belonging to C into A and B; and so on. However, you can also have any other arrangement - e.g. simulate 10 rings instead of 4 rings, or let C access part of D (but not all of D) and part of B (but not all of B), or...

The question then becomes; if paging alone is enough to simulate any number of rings (and more), why do we still have 2 rings?

The answer is that paging only controls access to things that are in memory (code, data), and doesn't/can't control access to things that aren't in memory (e.g. the CPU's control registers). 2 rings are still needed to control whether things that aren't in memory can/can't be accessed (e.g. whether a mov cr0, eax instruction will cause a general protection fault).

However; there's 2 things that make this less obvious. Switching between different virtual address spaces has some cost associated with it, and people try to minimize that cost (e.g. by not giving shared libraries their own separate virtual address spaces, by not giving individual device drivers their own virtual address space, etc); and because paging was added (with backward compatibility concerns) to a pre-existing "segmentation with 4 rings" design scraps of the old "segmentation with 4 rings" remain in use (e.g. the TSS, the IO permission system, etc).

like image 42
Brendan Avatar answered Nov 15 '22 08:11

Brendan