Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any single-consumer single-producer lock free queue implementation in C?

I'm writing a program with a consumer thread and a producer thread, now it seems queue synchronization is a big overhead in the program, and I looked for some lock free queue implementations, but only found Lamport's version and an improved version on PPoPP '08:

enqueue_nonblock(data) {
    if (NULL != buffer[head]) {
        return EWOULDBLOCK;
    }
    buffer[head] = data;
    head = NEXT(head);
    return 0;
}

dequeue_nonblock(data) {
    data = buffer[tail];
    if (NULL == data) {
        return EWOULDBLOCK;
    }
    buffer[tail] = NULL;
    tail = NEXT(tail);
    return 0;
}

Both versions require a pre-allocated array for the data, my question is that is there any single-consumer single-producer lock-free queue implementation which uses malloc() to allocate space dynamically?

And another related question is, how can I measure exact overhead in queue synchronization? Such as how much time it takes of pthread_mutex_lock(), etc.

like image 710
ZelluX Avatar asked Jun 13 '09 12:06

ZelluX


2 Answers

I think the allocator can be a performance problem. You can try to use a custom multithreaded memory allocator, that use a linked-list for maintaing freed blocks. If your blocks are not (nearly) the same size, you can implement a "Buddy system memory allocator", witch is very fast. You have to synchronise your queue (ring buffer) with a mutex.

To avoid too much synchronisation, you can try write/read multiple values to/from the queue at each access.

If you still want to use, lock-free algorithms, then you must use pre-allocated data or use a lock-free allocator. There is a paper about a lock-free allocator "Scalable Lock-Free Dynamic Memory Allocation", and an implementation Streamflow

Before starting with Lock-free stuff, look at:Circular lock-free buffer

like image 55
bill Avatar answered Sep 19 '22 11:09

bill


Adding malloc would kill any performance gain you may make and a lock based structure would be just as effective. This is so because malloc requires some sort of CAS lock over the heap and hence some forms of malloc have their own lock so you may be locking in the Memory Manager.

To use malloc you would need to pre allocate all the nodes and manage them with another queue...

Note you can make some form of expandable array which would need to lock if it was expanded.

Also while interlocked are lock free on the CPU they do placea memory lock and block memory for the duration of the instruction and often stall the pipeline.

like image 31
ben Avatar answered Sep 21 '22 11:09

ben