On multiprocessor, each core can have its own variables. I thought they are different variables in different addresses, although they are in same process and have the same name.
But I am wondering, how does the kernel implement this? Does it dispense a piece of memory to deposit all the percpu pointers, and every time it redirects the pointer to certain address with shift or something?
percpu sections, where N is the number of CPUs, and the section used by the bootstrap processor will contain an uninitialized variable created with the DEFINE_PER_CPU macro. The kernel provides an API for per-cpu variables manipulating: get_cpu_var(var)
A per-CPU variable in the Linux kernel is actually an array with one instance of the variable for each processor. Each processor works with its own copy of the variable; this can be done with no locking, and with no worries about cache line bouncing.
Normal global variables are not per CPU. Automatic variables are on the stack, and different CPUs use different stack, so naturally they get separate variables.
I guess you're referring to Linux's per-CPU variable infrastructure.
Most of the magic is here (asm-generic/percpu.h
):
extern unsigned long __per_cpu_offset[NR_CPUS];
#define per_cpu_offset(x) (__per_cpu_offset[x])
/* Separate out the type, so (int[3], foo) works. */
#define DEFINE_PER_CPU(type, name) \
__attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
/* var is in discarded region: offset to particular copy we want */
#define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]))
#define __get_cpu_var(var) per_cpu(var, smp_processor_id())
The macro RELOC_HIDE(ptr, offset)
simply advances ptr
by the given offset in bytes (regardless of the pointer type).
What does it do?
DEFINE_PER_CPU(int, x)
, an integer __per_cpu_x
is created in the special .data.percpu
section.__per_cpu_offset
array is filled with the distances between the copies. Supposing 1000 bytes of per cpu data are used, __per_cpu_offset[n]
would contain 1000*n
.per_cpu__x
will be relocated, during load, to CPU 0's per_cpu__x
.__get_cpu_var(x)
, when running on CPU 3, will translate to *RELOC_HIDE(&per_cpu__x, __per_cpu_offset[3])
. This starts with CPU 0's x
, adds the offset between CPU 0's data and CPU 3's, and eventually dereferences the resulting pointer.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With