Which addressing is used in processors x86/x86_64 for caching in the L1, L2 and L3(LLC) - physical or virtual(using PT/PTE and TLB) and somehow does PAT(page attribute table) affect to it?
And is there difference between the drivers(kernel-space) and applications(user-space) in this case?
Short answer - Intel uses virtually indexed, physically tagged (VIPT) L1 caches: What will be used for data exchange between threads are executing on one Core with HT?
8-way
cache for define Set
is required low 12 bits
which are the same in virt & phys)Most general-purpose computers support virtual memory. Generally, the cache associated with each processor is accessed with a physical address obtained after translation of the virtual address in a Translation Lookaside Buffer (TLB).
L2 and L3 caches are bigger than L1. They are extra caches built between the CPU and the RAM. Sometimes L2 is built into the CPU with L1. L2 and L3 caches take slightly longer to access than L1. The more L2 and L3 memory available, the faster a computer can run.
L1 is "level-1" cache memory, usually built onto the microprocessor chip itself. For example, the Intel MMX microprocessor comes with 32 thousand bytes of L1. L2 (that is, level-2) cache memory is on a separate chip (possibly on an expansion card) that can be accessed more quickly than the larger "main" memory.
L2 cache may be embedded on the CPU, or it can be on a separate chip or coprocessor and have a high-speed alternative system bus connecting the cache and CPU. That way it doesn't get slowed by traffic on the main system bus. Level 3 (L3) cache is specialized memory developed to improve the performance of L1 and L2.
The answer to your question is - it depends. That's strictly a CPU design decision, which balances over the tradeoff between performance and complexity.
Take for example recent Intel Core processors - they're physically tagged and virtually indexed (at least according to http://www.realworldtech.com/sandy-bridge/7/). This means that the caches can only complete lookups in pure physical address space, in order to determine if the line is there or not. However, since the L1 is 32k, 8-way associative, it means that it uses 64 sets, so you need only address bits 6 to 11 in order to find the correct set. As it happens to be, virtual and physical addresses are the same in this range, so you can lookup the DTLB in parallel with reading a cache set - a known trick (see - http://en.wikipedia.org/wiki/CPU_cache for a good explanation).
In theory one can build a virtually index + virtualy tagged cache, which would remove the requirement to go through address translation (TLB lookup, and also page walks in case of TLB misses). However, that would cause numerous problems, especially with memory aliasing - a case where two virtual addresses map to the same physical one.
Say core1 has virtual addr A caches in such a fully-virtual cache (it maps to phys addr C, but we haven't done this check yet). core2 writes to virtual addr B that map to the same phys addr C - this means we need some mechanism (usually a "snoop", term coined by Jim Goodman) that goes and invalidates that line in core1, managing the data merge and coherency management if needed. However, core1 can't answer to that snoop since it doesn't know about virtual addr B, and doesn't store physical addr C in the virtual cache. So you can see we have an issue, although this is mostly relevant for strict x86 systems, other architectures may be more lax and allow a simpler management of such caches.
Regarding the other questions - there's no real connection with PAT that I can think of, the cache is already designed, and can't change for different memory types. Same answer for the other question - the HW is mostly beneath the distinction between user/kernel mode (except for the mechanisms it provides for security checking, mostly the various rings).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With