Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are spinlocks a good choice for a memory allocator?

I've suggested to the maintainers of the D programming language runtime a few times that the memory allocator/garbage collector should use spinlocks instead of regular OS critical sections. This hasn't really caught on. Here are the reasons I think spinlocks would be better:

  1. At least in synthetic benchmarks that I did, it's several times faster than OS critical sections when there's contention for the memory allocator/GC lock. Edit: Empirically, using spinlocks didn't even have measurable overhead in a single-core environment, probably because locks need to be held for such a short period of time in a memory allocator.
  2. Memory allocations and similar operations usually take a small fraction of a timeslice, and even a small fraction of the time a context switch takes, making it silly to context switch in the case of contention.
  3. A garbage collection in the implementation in question stops the world anyhow. There won't be any spinning during a collection.

Are there any good reasons not to use spinlocks in a memory allocator/garbage collector implementation?

like image 449
dsimcha Avatar asked Dec 15 '09 21:12

dsimcha


1 Answers

  1. Obviously the worst-case behavior of a spinlock is awful (the OS scheduler just sees 30 CPU-bound threads, so it tries to give them all some CPU time; 29 of them spin like mad while the thread that holds the lock sleeps), so you should avoid them if you can. Plenty of people smarter than me claim spinlocks have no userspace use cases because of this.

  2. System mutexes should spin a little before putting the thread to sleep (or indeed making any kind of system call), so they can sometimes perform exactly the same as spinlocks even when there's some contention.

  3. An allocator can often practically eliminate lock contention by using the lock only to allocate pages to threads. Each thread is then responsible for partitioning its own pages. You end up acquiring the lock only once every N allocations, and you can configure N to whatever you like.

I consider 2 and 3 to be strong arguments that can't be effectively countered by synthetic benchmarks. You would need to show that the performance of a real-world program suffers.

like image 193
Jason Orendorff Avatar answered Oct 30 '22 22:10

Jason Orendorff