Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

valgrind stalls in multithreaded socket program

I'm running a multithreaded socket program with valgrind. The client will send out a request to the server over TCP, and then busy wait on a boolean. The boolean will be set when the callback function which services the response from the server is called. Once the response is received (and the boolean flag is set), the server will again send out a request, and do this repeatedly in a loop.

I realise that unsychronised access to shared variables (the boolean) can cause threading issues, but I've tried using pthread mutexes, and the program slows down by about 20% (speed is of importance here). I'm confident that writing to the shared boolean variable is fine as it can be done in a single cycle.

The program runs fine outside of valgrind, but will often stall when run with valgrind. I left the program to run overnight.. usually it takes a few seconds to complete, so I don't think it's a case of not waiting long enough for the program to finish. The threading is managed by the open source engine framework (quick fix), so I don't think it's a problem with how the threads are created/managed.

Does anyone know of any problems with valgrind around multi threaded programs/busy wait loops/socket communications (or a combination of these)?

like image 991
Trent Avatar asked Dec 29 '11 01:12

Trent


4 Answers

While other answers focus on insisting that you take the standard synchronization approach (something I fully agree with), I thought instead I should answer your question regarding Valgrind.

As far as I know there are no issues with Valgrind running in multi-threaded environment. I believe Valgrind forces the application to run on a single core, but other than that it should not affect your threads.

What Valgrind is probably doing to your application is altering the timings and interactions between your threads in ways that might be exposing bugs and race conditions in your code that you don't normally see while running stand-alone.

The same logic you applied to decide that the bug could not be in the open source threading framework you are using also applies to Valgrind in my opinion. I recommend that you consider these hangs as bugs in your code and debug them as such, because that is most likely what they are.

As a side note, using a mutex is probably overkill for the problem you described. You should investigate semaphores or condition variables instead.

Good luck.

like image 56
Miguel Avatar answered Oct 14 '22 09:10

Miguel


I just had a similar problem. Like the OP I had one thread doing a busy wait. In my case the problem was that the busy wait was taking almost all the CPU cycles and causing the other threads to run many thousands of times slower. At first I fixed this by putting a usleep(1) in the busy wait loop (only for Valgrind builds). Then I read the Valgrind manual and learned of the --fair-sched=yes option, which also fixed the problem and allowed me to remove the usleep(1).

like image 40
Andrew Bainbridge Avatar answered Oct 14 '22 11:10

Andrew Bainbridge


Reading/writing a boolean is not an atomic operation on x86.

See my question here: Is volatile a proper way to make a single byte atomic in C/C++?

like image 34
Axel Gneiting Avatar answered Oct 14 '22 09:10

Axel Gneiting


Even if writing your boolean is an atomic operation, the compiler and the CPU are free to re-order the update around other memory accesses. Your busy-waiting thread may awake from the busy loop and discover that the shared data structure has not actually been updated yet.

I strongly recommend sticking to the threading primitives available to you to write consistent programs that execute exactly as you want them to, every single time.

like image 40
sarnold Avatar answered Oct 14 '22 10:10

sarnold