Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Thread related issues and debugging them

This is my follow up to the previous post on memory management issues. The following are the issues I know.

1)data races (atomicity violations and data corruption)

2)ordering problems

3)misusing of locks leading to dead locks

4)heisenbugs

Any other issues with multi threading ? How to solve them ?

like image 446
brett Avatar asked Aug 18 '10 18:08

brett


4 Answers

Eric's list of four issues is pretty much spot on. But debugging these issues is tough.

For deadlock, I've always favored "leveled locks". Essentially you give each type of lock a level number. And then require that a thread aquire locks that are monotonic.

To do leveled locks, you can declare a structure like this:

typedef struct {
   os_mutex actual_lock;
   int level;
   my_lock *prev_lock_in_thread;
} my_lock_struct;

static __tls my_lock_struct *last_lock_in_thread;

void my_lock_aquire(int level, *my_lock_struct lock) {
    if (last_lock_in_thread != NULL) assert(last_lock_in_thread->level < level)
    os_lock_acquire(lock->actual_lock)
    lock->level = level
    lock->prev_lock_in_thread = last_lock_in_thread
    last_lock_in_thread = lock
}

What's cool about leveled locks is the possibility of deadlock causes an assertion. And with some extra magic with FUNC and LINE you know exactly what badness your thread did.

For data races and lack of synchronization, the current situation is pretty poor. There are static tools that try to identify issues. But false positives are high.

The company I work for ( http://www.corensic.com ) has a new product called Jinx that actively looks for cases where race conditions can be exposed. This is done by using virtualization technology to control the interleaving of threads on the various CPUs and zooming in on communication between CPUs.

Check it out. You probably have a few more days to download the Beta for free.

Jinx is particularly good at finding bugs in lock free data structures. It also does very well at finding other race conditions. What's cool is that there are no false positives. If your code testing gets close to a race condition, Jinx helps the code go down the bad path. But if the bad path doesn't exist, you won't be given false warnings.

like image 175
Dave Dunn Avatar answered Sep 28 '22 05:09

Dave Dunn


Unfortunately there's no good pill that helps automatically solve most/all threading issues. Even unit tests that work so well on single-threaded pieces of code may never detect an extremely subtle race condition.

One thing that will help is keeping the thread-interaction data encapsulated in objects. The smaller the interface/scope of the object, the easier it will be to detect errors in review (and possibly testing, but race conditions can be a pain to detect in test cases). By keeping a simple interface that can be used, clients that use the interface will also be correct just by default. By building up a bigger system from lots of smaller pieces (only a handful of which actually do thread-interaction), you can go a long way towards averting threading errors in the first place.

like image 35
Mark B Avatar answered Sep 28 '22 07:09

Mark B


The four most common problems with theading are

1-Deadlock
2-Livelock
3-Race Conditions
4-Starvation

like image 30
Eric Avatar answered Sep 28 '22 05:09

Eric


How to solve [issues with multi threading]?

A good way to "debug" MT applications is through logging. A good logging library with extensive filtering options makes it easier. Of course, logging itself influences the timing, so you still can have "heisenbugs", but it's much less likely than when you're actuall breaking into the debugger.

Prepare and plan for that. Include a good logging facility into your application from the start.

like image 34
sbi Avatar answered Sep 28 '22 07:09

sbi