Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multiple threads able to get flock at the same time

Tags:

c

linux

glibc

flock

I was under the impression that flock(2) is thread safe, I recently, ran across the case in the code, where multiple threads are able to get a lock on the same file which are all synchronized with the use of obtaining exclusive lock using the c api flock. The process 25554 is multi-threaded app which has 20 threads, the number of threads having lock to the same file varies when the deadlock happens. The multi threaded app testEvent is writer to the file, where was the push is the reader from the file. Unfortunately the lsof does not print the LWP value so I cannot find which are the threads that are holding the lock. When the below mentioned condition happens both the process and threads are stuck on the flock call as displayed by the pstack or strace call on the pid 25569 and 25554. Any suggestions on how to overcome this in RHEL 4.x.

One thing I wanted to update is flock does not misbehave all the time, when the tx rate of the messages is more than 2 mbps only then I get into this deadlock issue with flock, below that tx rate everything is file. I have kept the num_threads = 20, size_of_msg = 1000bytes constant and just varied the number of messages tx per second start from 10 messages to 100 messages which is 20*1000*100 = 2 mbps, when I increase the number of messages to 150 then flock issue happens.

I just wanted to ask what is your opinion about flockfile c api.

 sudo lsof filename.txt
    COMMAND       PID     USER     FD       TYPE     DEVICE     SIZE   NODE       NAME
    push         25569    root     11u       REG      253.4      1079   49266853   filename.txt
    testEvent    25554    root     27uW      REG      253.4      1079   49266853   filename.txt
    testEvent    25554    root     28uW      REG      253.4      1079   49266853   filename.txt
    testEvent    25554    root     29uW      REG      253.4      1079   49266853   filename.txt
    testEvent    25554    root     30uW      REG      253.4      1079   49266853   filename.txt

The multithreaded test program that will call the write_data_lib_func lib function.

void* sendMessage(void *arg)  {

int* numOfMessagesPerSecond = (int*) arg;
std::cout <<" Executing p thread id " << pthread_self() << std::endl;
 while(!terminateTest) {
   Record *er1 = Record::create();
   er1.setDate("some data");

   for(int i = 0 ; i <=*numOfMessagesPerSecond ; i++){
     ec = _write_data_lib_func(*er1);
     if( ec != SUCCESS) {
       std::cout << "write was not successful" << std::endl;

     }

   }
   delete er1;
   sleep(1);
 }

 return NULL;

The above method will be called in the pthreads in the main function of the test.

for (i=0; i<_numThreads ; ++i) {
  rc = pthread_create(&threads[i], NULL, sendMessage, (void *)&_num_msgs);
  assert(0 == rc);

}

Here is the writer/reader source, due to proprietary reasons I did not want to just cut and paste, the writer source will accessed multiple threads in a process

int write_data_lib_func(Record * rec) {      
if(fd == -1 ) {  
    fd = open(fn,O_RDWR| O_CREAT | O_APPEND, 0666);
} 
if ( fd >= 0 ) {
   /* some code */ 

   if( flock(fd, LOCK_EX) < 0 ) {
     print "some error message";
   }
   else { 
    if( maxfilesize) {
      off_t len = lseek ( fd,0,SEEK_END);
      ...
      ... 
      ftruncate( fd,0);
      ...
      lseek(fd,0,SEEK_SET); 
   } /* end of max spool size */ 
   if( writev(fd,rec) < 0 ) {
     print "some error message" ; 
   }

   if(flock(fd,LOCK_UN) < 0 ) {
   print some error message; 
   } 

In the reader side of things is a daemon process with no threads.

int readData() {
    while(true) {
      if( fd == -1 ) {
         fd= open (filename,O_RDWR);
      }
      if( flock (fd, LOCK_EX) < 0 ) { 
        print "some error message"; 
        break; 
      } 
      if( n = read(fd,readBuf,readBufSize)) < 0 ) { 
        print "some error message" ;
        break;
      }  
      if( off < n ) { 
        if ( off <= 0 && n > 0 ) { 
          corrupt_file = true; 
        } 
        if ( lseek(fd, off-n, SEEK_CUR) < 0 ) { 
          print "some error message"; 
        } 
        if( corrupt_spool ) {  
          if (ftruncate(fd,0) < 0 ) { 
             print "some error message";
             break;
           }  
        }
      }
      if( flock(fd, LOCK_UN) < 0 ) 
       print some error message ;
      }  
   }     
}
like image 851
user1235176 Avatar asked Feb 27 '12 09:02

user1235176


2 Answers

flock(2) is documented as "blocking if an incompatible lock is held by another process" and with "locks created by flock() are associated with an open file table entry", so it should be expected that flock-ed locks by several threads of the same process don't interact. (the flock documentation doesn't mention threads).

Hence, the solution should be simple for you: associate one pthread_mutex_t to every flock-able file descriptor, and protect the call to flock with that mutex. You might also use pthread_rwlock_t if you want a read vs write locking.

like image 63
Basile Starynkevitch Avatar answered Oct 13 '22 00:10

Basile Starynkevitch


From the Linux man page for flock(2):

Locks created by flock() are associated with an open file table entry. This means that duplicate file descriptors (created by, for example, fork(2) or dup(2)) refer to the same lock, and this lock may be modified or released using any of these descriptors. Furthermore, the lock is released either by an explicit LOCK_UN operation on any of these duplicate descriptors, or when all such descriptors have been closed.

In addition, flock locks don't 'stack', so if you try to acquire a lock you already hold, the flock call is a noop that returns immediately without blocking and without changing the lock state in any way.

Since threads within a process share file descriptors, you can flock the file multiple times from different threads, and it won't block, as the lock is already held.

Also from the notes on flock(2):

flock() and fcntl(2) locks have different semantics with respect to forked processes and dup(2). On systems that implement flock() using fcntl(2), the semantics of flock() will be different from those described in this manual page.

like image 24
Chris Dodd Avatar answered Oct 12 '22 23:10

Chris Dodd