Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to debug deadlock problems in kernel

I have a buggy kernel module which I am trying to fix. Basically when this module is running, it will cause other tasks to hang for more than 120 seconds. Since almost all the hung tasks are waiting for either mm->mmap_sem or some file system locks (i_node->i_mutex) I suspect that it has something to do with this module doesn't not grab the mmap_sem lock and some file-system level lock (like inote->i_mutex) in order, which could have caused some deadlock problem. Since my module does not try to grab those locks directly though, I assume it is some function I called that grab those locks. And now I am trying to figure out which function calls in my module is causing the problem.

However, I am having a hard time debugging it for the following reasons:

  1. I don't know exactly which lock the hung task is trying to grab. I got the call trace of the hung task, and know at what point it hangs. Kernel also gives me some kind of information like: "1 lock held by automount/3115: 0: (&type->i_mutex_dir_key#2){--..}, at: [] real_lookup+0x24/0xc5". However, I want to know exact which lock a task holds, and exactly which lock it is trying to acquire in order to figure out the problem. As kernel doesn't provide the arguments of function calls along with the call trace, I find this information difficult to obtain.

  2. I am using gdb andvmware to debug this, which allows me to set breakpoints, step into a function and such. However, as which task and at what point that task will hang is kind of un-deterministic, I don't really know where to set breakpoints and inspect. It will be great if I can somehow "attach" to the task which kernel reported to be blocked for more than 120 secs, and get some information about it.

So my questions are as following:

  1. Where can I get, along with the call trace, the arguments of the functions in the call trace, in order to figure out exactly which lock a task is trying to grab.

  2. Is it possible for me to use gdb to somehow "attach" to a hung task in a kernel? If not, is there some way for me to at least examine the data structure which represents that task? As I am having a hard time examining all the global data structure in kernel too. GDB always complains that "can't access memory 0x3200" or something similar.

  3. It would also be very helpful if I can print out for every task in the kernel, what locks they are currently holding. Is there a way to do it?

Thank you very much!

like image 588
yangsuli Avatar asked Feb 05 '12 05:02

yangsuli


People also ask

How do I debug a kernel issue?

cd to your directory of your kernel tree and run gdb on the “.o” file which has the function sd_remove() in this case in sd.o, and use the gdb “list” command, (gdb) list *(function+0xoffset), in this case function is sd_remove() and offset is 0x20, and gdb should tell you the line number where you hit the panic or oops ...

What is kernel deadlock?

A deadlock is a condition involving one or more threads of execution and one or more resources, such that each thread is waiting for one of the resources, but all the resources are already held.

How does Linux avoid deadlocks in kernel?

The kernel does not avoid deadlocks of user-space locks (because often it doesn't even know about them). Deadlocks of kernel locks are avoided by writing code that is correct. This is greatly helped by lockdep, which can prove the correctness of locking operations.

How do I debug a deadlock in Linux?

The procedure used to debug a deadlock depends on whether the deadlock occurs in user mode or in kernel mode. When a deadlock occurs in user mode, use the following procedure to debug it: Issue the !ntsdexts.locks extension. In user mode, you can just type !locks at the debugger prompt; the ntsdexts prefix is assumed.

What is deadlock in threading?

A deadlock arises when two or more threads have requested locks on two or more resources, in an incompatible sequence. For instance, suppose that Thread One has acquired a lock on Resource A and then requests access to Resource B. Meanwhile, Thread Two has acquired a lock on Resource B and then requests access to Resource A.

Can lockdep detect deadlock?

The lockdep code debugs only locks, it can not warn you about deadlocks that arise from something else. Thanks for contributing an answer to Stack Overflow!

How can a deadlock be prevented?

Many deadlocks can be prevented by simply requiring all processes that lock multiple resources to lock them in the same order (e.g., alphabetically by lock name) How can it change the order in which locks are acquired without also changing the execution order?


2 Answers

Not answering your question directly, but hopefully this is more helpful - the Linux kernel has a built heavy duty lock validator called lockdep. Turn it on and let it run. If you have a lock order problem, it is likely to catch it and give you a detailed report.

See: http://www.mjmwired.net/kernel/Documentation/lockdep-design.txt

like image 170
gby Avatar answered Sep 28 '22 20:09

gby


The kernel feature lockdep can help you in this regard. Check out my post on how to use it in your kernel: How to use lockdep feature in linux kernel for deadlock detection

like image 30
brokenfoot Avatar answered Sep 28 '22 20:09

brokenfoot