When I have per-CPU data structures, does it improve performance to have them on different pages?

Question

I have a small struct of per-CPU data in a linux kernel module, where each CPU frequently writes and reads its own data. I know that I need to make sure these items of data aren't on the same cache line, because if they were then the cores would be forever dirtying each other's caches. However, is there anything at the page level that I need to worry about from an SMP performance point of view? ie. would there be any performance impact from padding these per-cpu structures out to 4096 bytes and aligning them?

This is on linux 2.6 on x86_64.

(Points about whether it's worth optimising and suggestions that I go benchmark it aren't needed -- what I'm looking for is whether there's any theoretical basis for worrying about page alignment).

This is on linux 2.6 on x86_64.

(Points about whether it's worth optimising and suggestions that I go benchmark it aren't needed -- what I'm looking for is whether there's any theoretical basis for worrying about page alignment).

caf · Accepted Answer

Within a single NUMA node, different pages are only helpful if you want to apply different permissions, or map them individually into processes. For performance issues, being on different cachelines is sufficient.

On NUMA architectures, you may want to place a CPU's per-CPU structure on a page that is local to that CPU's node - but you still wouldn't pad the structure out to a page size to achieve that, because you can place the structures for multiple CPUs within the same NUMA node on the same page.

Eric Seppanen · Answer

Even on a NUMA system, you probably won't benefit much by allocating memory pages local to each cpu (use kmalloc_node(), if you're curious).

Node-local memory will be faster, but only in the case where it misses at all cache levels. For anything used with any frequency, you probably won't be able to tell the difference. If you're allocating megabytes of cpu-local data, then it probably makes sense to allocate pages local to each cpu.

When I have per-CPU data structures, does it improve performance to have them on different pages?

Tags:

c

memory-management

optimization

linux-kernel

kdt

2 Answers

caf

Eric Seppanen

Recent Activity

Donate For Us

When I have per-CPU data structures, does it improve performance to have them on different pages?

Tags:

c

memory-management

optimization

linux-kernel

kdt

2 Answers

caf

Eric Seppanen

Related questions

Recent Activity

Donate For Us