Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Thread synchronization issue: possible race, misuse of volatile, cache coherency?

I am getting acquainted with pthreads programming; the following code is a simple producer/consumer design where data is put/fetched from a global list. The problem is: the data pointer in the consumer function is tried to be freed twice. Beside, if I add a printf() at the beginning of the loop, everything seems ok... What am I doing wrong? I suspect a misuse of the volatile keyword, or something hidden by the cache... Unless it's just a design issue (which probably is :p).

Thanks for your insights.

Note: malloc()/free() is thread-safe on my system. I am compiling with $ gcc -pthread -O0 which should, I guess, reduce possible design errors due to misuse of volatile. Finally, we don't care in this code snippet with running out of memory (in case of more producer than consumer).

Edit: Changed code to a single list head.

#include <pthread.h>
#include <stdlib.h>
#include <string.h>

pthread_mutex_t lock;
pthread_cond_t new_data;

struct data {
  int i;
  struct data *next;
};

struct data *list_head = NULL;

void *consume(void *arg)
{
  struct data *data;

  while (1) {
    pthread_mutex_lock(&lock);
    while (list_head == NULL) {
      pthread_cond_wait(&new_data, &lock);
    }
    data = list_head;
    list_head = list_head->next;
    pthread_mutex_unlock(&lock);
    free(data);
  }

  return NULL;
}

void *produce(void *arg)
{
  struct data *data;

  while (1) {
    data = malloc(sizeof(struct data));
    pthread_mutex_lock(&lock);
    data->next = list_head;
    list_head = data;
    pthread_mutex_unlock(&lock);
    pthread_cond_signal(&new_data);
  }

  return NULL;
}

int main()
{
  pthread_t tid[2];
  int i;

  pthread_mutex_init(&lock, NULL);
  pthread_cond_init(&new_data, NULL);
  pthread_create(&tid[0], NULL, consume, NULL);
  pthread_create(&tid[1], NULL, produce, NULL);
  for (i = 0; i < 2; i++) {
    pthread_join(tid[i], NULL);
  }
}

And the output:

$ ./a.out 
*** glibc detected *** ./a.out: double free or corruption (fasttop): 0x00007f5870109000 ***
like image 345
Benoit Avatar asked Dec 26 '22 10:12

Benoit


1 Answers

Consider the following scenario:

  1. produce -> acquired lock
  2. consume -> wait lock
  3. produce -> allocate d0, write_ptr = d0, read_ptr = d0, signal, unlock
  4. consume -> acquired lock
  5. produce -> wait lock
  6. consume -> satisfied condition
  7. consume -> data = d0, read_ptr = NULL, unlock
  8. consume -> free d0
  9. produce -> acquired lock, allocate d1
  10. consume -> wait lock
  11. produce -> write_ptr != null so write_ptr->next = d1
  12. produce -> read_ptr == null so read_ptr = d1

Check out step 11. write_ptr is still d0 even though it was free'd independently of the producer thread. You need to make sure consume does not invalidate write_ptr.

A doubly linked list would allow you to avoid some of these difficulties (as the readers and writers work from separate ends).

Main:

  • Create sentinel nodes HEAD and TAIL, link HEAD and TAIL
  • Spawn producer
  • Spawn consumer

Producer:

  • Lock
  • Create node
  • Link HEAD->next->prev to node
  • Link node->next to HEAD->next
  • Link HEAD->next to node
  • Link node->prev to HEAD
  • Unlock
  • Signal

Consumer:

  • Lock
  • Wait for TAIL->prev != HEAD (do { pthread_cond_wait } while (condition);)
  • data = TAIL->prev
  • Link TAIL->prev to data->prev
  • Link TAIL->prev->next to TAIL
  • Unlock
  • Use data
like image 111
user7116 Avatar answered Dec 28 '22 23:12

user7116