Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

difference between pthread_spinlock and boost::smart_ptr::spinlock?

I found the following spinlock code in boost::smart_ptr:

bool try_lock()
{
    return (__sync_lock_test_and_set(&v_, 1) == 0);
}
void lock()
{    
    for (unsigned k=0; !try_lock(); ++k)
    {
        if (k<4)
            ; // spin
        else if (k < 16)
            __asm__ __volatile__("pause"); // was ("rep; nop" ::: "memory")
        else if (k < 32 || k & 1)
            sched_yield();
        else
        {
            struct timespec rqtp;
            rqtp.tv_sec  = 0;
            rqtp.tv_nsec = 100;
            nanosleep(&rqtp, 0);
        }
    }
}
void unlock()
{
     __sync_lock_release(&v_);
}

So if I understand this correctly, when the lock is contended the incoming thread will exponentially back-off, first spinning wildly, then pausing, then yielding the remainder of its time slice, and finally flip-flopping between sleeping and yielding.

I also found the glibc pthread_spinlock implementation, which uses assembly to perform the lock.

#define LOCK_PREFIX "lock;" // using an SMP machine

int pthread_spin_lock(pthread_spinlock_t *lock)
{
    __asm__ ("\n"
       "1:\t" LOCK_PREFIX "decl %0\n\t"
       "jne 2f\n\t"
       ".subsection 1\n\t"
       ".align 16\n"
       "2:\trep; nop\n\t"
       "cmpl $0, %0\n\t"
       "jg 1b\n\t"
       "jmp 2b\n\t"
       ".previous"
       : "=m" (*lock)
       : "m" (*lock));

    return 0;
}

I will admit that my understanding of assembly is not great, so I don't fully understand what is happening here. (Could someone please explain what this is doing?)

However, I ran some tests against the boost spinlock and glibc pthread_spinlock, and when there are more cores than threads, the boost code outperforms the glibc code.

On the other hand, When there are more threads than cores, the glibc code is better.

Why is this? What is the difference between these two spinlock implementations that makes them perform differently in each scenario?

like image 894
Steve Lorimer Avatar asked Jun 20 '12 01:06

Steve Lorimer


1 Answers

Where did you get the pthread_spin_lock() implementation posted in the question? It seems to be missing a couple important lines.

The implementation I see (which isn't inline assembly - it's a stand-alone assembly source file from glibc/nptl/sysdeps/i386/pthread_spin_lock.S) looks similar, but has the two additional critical instructions:

#include <lowlevellock.h>

    .globl  pthread_spin_lock
    .type   pthread_spin_lock,@function
    .align  16
pthread_spin_lock:
    mov 4(%esp), %eax
1:  LOCK
    decl    0(%eax)
    jne 2f
    xor %eax, %eax
    ret

    .align  16
2:  rep
    nop
    cmpl    $0, 0(%eax)
    jg  1b
    jmp 2b
    .size   pthread_spin_lock,.-pthread_spin_lock

It decrements a long pointed to by the passed in parameter and returns if the result is zero.

Otherwise, the result was non-zero which means this thread didn't acquire the lock. So it performs a rep nop, which is equivalent to the pause instruction. This is a 'special' nop that gives a hint to the CPU that the thread is in a spin, and the cpu should handle memory ordering and/or branch prdiction in some manner that improves performance in these situations (I don't pretend to understand exactly what happens differently under the chip's covers - from the software's point of view, there's no difference from a plain old nop).

After the pause it checks the value again - if it's greater than zero, the lock is unclaimed so it jumps to the top of the function and tries to claim the lock again. Otherwise, it jumps to the pause again.

The main difference between this spinlock and the Boost version is that this one never does anything fancier than a pause when it's spinning - there's nothing like a sched_yield() or nanosleep(). So the thread stays hot. I'm not sure precisely how this plays in the two behaviors you noted, but the glibc code will be greedier - if a thread is spinning on the lock and there are other threads ready to run but no available core, the spinning thread doesn't help the waiting thread get any cpu time, while the Boost version will eventually voluntarily make way for the thread that's waiting for some attention.

like image 75
Michael Burr Avatar answered Sep 22 '22 01:09

Michael Burr