Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to interpret addresses in a kernel oops

Tags:

I have a kernel oops in a linux device driver I wrote. I want to determine which line is responsible for the oops. I have the following output, but I do not know how to interpret it.

Does it mean my code crashed at the instruction at write_func + 0x63? How can I relate the value in EIP to my own function? What do the values after the backslash mean?

[10991.880354] BUG: unable to handle kernel NULL pointer dereference at   (null) [10991.880359] IP: [<c06969d4>] iret_exc+0x7d0/0xa59 [10991.880365] *pdpt = 000000002258a001 *pde = 0000000000000000 [10991.880368] Oops: 0002 [#1] PREEMPT SMP [10991.880371] last sysfs file: /sys/devices/platform/coretemp.3/temp1_input [10991.880374] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc   hdrdmod(F) coretemp(F) af_packet fuse edd cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode dm_mod ppdev sg og3 ghes i2c_i801 igb hed pcspkr iTCO_wdt dca iTCO_vendor_support parport_pc floppy parport ext4 jbd2 crc16 i915 drm_kms_helper drm i2c_algo_bit video button fan processor thermal thermal_sys [last unloaded: preloadtrace] [10991.880400] [10991.880402] Pid: 4487, comm: python Tainted: GF           2.6.37.1-1.2-desktop #1 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M. [10991.880408] EIP: 0060:[<c06969d4>] EFLAGS: 00210246 CPU: 0 [10991.880411] EIP is at iret_exc+0x7d0/0xa59 [10991.880413] EAX: 00000000 EBX: 00000000 ECX: 0000018c EDX: b7837000 [10991.880415] ESI: b7837000 EDI: 00000000 EBP: b7837000 ESP: e2a81ee0 [10991.880417]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [10991.880420] Process python (pid: 4487, ti=e2a80000 task=df940530 task.ti=e2a80000) [10991.880422] Stack: [10991.880423]  00000000 0000018c 00000000 0000018c e5e903dc e4616353 00000009 df99735c [10991.880428]  df900a7c df900a7c b7837000 df80ad80 df99735c 00000009 e46182a4 e2a81f70 [10991.880433]  e28cd800 e09fc840 e28cd800 fffffffb e09fc888 c03718c1 e4618290 0000018c [10991.880438] Call Trace: [10991.882006] Inexact backtrace: [10991.882006] [10991.882012]  [<e4616353>] ? write_func+0x63/0x160 [mymod] [10991.882017]  [<c03718c1>] ? proc_file_write+0x71/0xa0 [10991.882020]  [<c0371850>] ? proc_file_write+0x0/0xa0 [10991.882023]  [<c036c971>] ? proc_reg_write+0x61/0x90 [10991.882026]  [<c036c910>] ? proc_reg_write+0x0/0x90 [10991.882031]  [<c0323060>] ? vfs_write+0xa0/0x160 [10991.882034]  [<c03243c6>] ? fget_light+0x96/0xb0 [10991.882037]  [<c0323331>] ? sys_write+0x41/0x70 [10991.882040]  [<c0202f0c>] ? sysenter_do_call+0x12/0x22 [10991.882044]  [<c069007b>] ? _lock_kernel+0xab/0x180 [10991.882046] Code: f3 aa 58 59 e9 5a f9 d7 ff 8d 0c 88 e9 12 fa d7 ff 01 d9 e9 7b fa d7 ff 8d 0c 8b e9 73 fa d7 ff 01 d9 eb 03 8d 0c 8b 51 50 31 c0 <f3> aa 58 59 e9 cf fa d7 ff 01 d9 e9 38 fb d7 ff 8d 0c 8b e9 30 [10991.882069] EIP: [<c06969d4>] iret_exc+0x7d0/0xa59 SS:ESP 0068:e2a81ee0 [10991.882072] CR2: 0000000000000000  [10991.889660] ---[ end trace 26fe339b54b2ea3e ]--- 
like image 577
Hans Then Avatar asked May 24 '13 09:05

Hans Then


People also ask

How do I debug kernel in oops?

cd to your directory of your kernel tree and run gdb on the “.o” file which has the function sd_remove() in this case in sd.o, and use the gdb “list” command, (gdb) list *(function+0xoffset), in this case function is sd_remove() and offset is 0x20, and gdb should tell you the line number where you hit the panic or oops ...

Do you know panic and oops errors in kernel Crash?

Oops is a way to debug kernel code, and there are utilities for helping with that. A kernel panic means the system cannot recover and must be restarted. However, with an Oops, the system can usually continue. You can configure klogd and syslogd to log oops messages to files, rather than to std out.

What is Linux kernel panic?

A Linux kernel panic is a system boot issue that occurs when the kernel can't load properly, and prevents the system from booting. It usually manifests as a black screen filled with code. During a normal boot process, the kernel (vmlinuz) doesn't load directly. Instead, the initramfs file loads in the RAM.

How does Linux handle kernel panic?

The first thing to do after seeing a kernel panic error is not to panic ,because now you are aware of the image file related to the error. Step 1: Boot the system normally with your given kernel version. This is your kernel panic situation. Step 2: Reboot your machine again and select the rescue prompt.


1 Answers

All the information you need is right there:

[10991.880354] BUG: unable to handle kernel NULL pointer dereference at   (null) 

That's the reason.

[10991.880359] IP: [<c06969d4>] iret_exc+0x7d0/0xa59 

That's the instruction pointer at the time of fault. We'll get back to this momentarily.

[10991.880365] *pdpt = 000000002258a001 *pde = 0000000000000000 

These are physical page table entries. the descriptor table, and the page descriptor entry. Naturally, the latter is NULL, since it's a NULL pointer. The above values are rarely useful (only in cases where physical memory mapping is required)

[10991.880368] Oops: 0002 [#1] PREEMPT SMP 

That's the oops code. PREEMPT SMP shows you the kernel is preemptible, and compiled for SMP, rather than UP. This is important for cases where the bug is from some race condition, etc.

[10991.880371] last sysfs file: /sys/devices/platform/coretemp.3/temp1_input 

That's not necessarily the culprit, but oftentimes is. sys files are exported by various kernel modules, and oftentimes an I/O operation on the sys file leads to the faulty module code execution.

[10991.880374] Modules linked in: ... [last unloaded: preloadtrace] 

The kernel doesn't necessarily know which module is to blame, so it's giving you all of them. Also, it may very well be that a recently unloaded module didn't clean up and left some residue (like some timer, or callback) in the kernel - which is a classic case for oops or panic. So the kernel reports the last unloaded one, as well.

[10991.880402] Pid: 4487, comm: python Tainted: GF           2.6.37.1-1.2-desktop #1 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M. 

If the faulting thread is a user mode thread, you get the PID and command line. "Tainted" flags are the kernel's way of saying it's not a kernel fault (the kernel source is open and "pure". "Taint" comes from the blasphemous non-GPL modules, and others.

[10991.880408] EIP: 0060:[<c06969d4>] EFLAGS: 00210246 CPU: 0 [10991.880411] EIP is at iret_exc+0x7d0/0xa59 

That gives you the faulting instruction pointer, both directly and in symbol+offset form. The part after the slash is the size of the function.

[10991.880413] EAX: 00000000 EBX: 00000000 ECX: 0000018c EDX: b7837000 [10991.880415] ESI: b7837000 EDI: 00000000 EBP: b7837000 ESP: e2a81ee0 [10991.880417]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 

The registers are shown here. Your NULL is likely EAX.

[10991.880420] Process python (pid: 4487, ti=e2a80000 task=df940530 task.ti=e2a80000) [10991.880422] Stack: [10991.880423]  00000000 0000018c 00000000 0000018c e5e903dc e4616353 00000009 df99735c [10991.880428]  df900a7c df900a7c b7837000 df80ad80 df99735c 00000009 e46182a4 e2a81f70 [10991.880433]  e28cd800 e09fc840 e28cd800 fffffffb e09fc888 c03718c1 e4618290 0000018c 

The area near the stack pointer is displayed. The kernel has no idea what these values mean, but they are the same output as you'd get from gdb displaying the $rsp. So it's up to you to figure what they are. (For example, c03718c1 is a kernel return address, likely - so you can go to /proc/kallsyms to figure it out, or rely on it being in the trace, as it is , next). This tells you that all the data up to it is the stack frame

Now, because you have the stack call trace, you can put the fragments together:

[10991.880423]  00000000 0000018c 00000000 0000018c e5e903dc e4616353 --> back to write_func  [            ]  ..................................................... 00000009 df99735c [10991.880428]  df900a7c df900a7c b7837000 df80ad80 df99735c 00000009 e46182a4 e2a81f70 [10991.880433]  e28cd800 e09fc840 e28cd800 fffffffb e09fc888 c03718c1  --> back to proc_file_write  [10991.882046] Code: f3 aa 58 59 e9 5a f9 d7 ff 8d 0c 88 e9 12 fa d7 ff 01 d9 e9 7b fa d7 ff 8d 0c 8b e9 73 fa d7 ff 01 d9 eb 03 8d 0c 8b 51 50 31 c0 <f3> aa 58 59 e9 cf fa d7 ff 01 d9 e9 38 fb d7 ff 8d 0c 8b e9 30 

Again, kernel can't disassemble for you (it's oopsing, and might very well panic, give it a break!). But you can use gdb to disassemble these values.

So now you know everything. You can actually disassemble your own module and figure out where exactly in write_func the NULL pointer is dereferenced. (You're probably passing it as an argument to some function).

like image 74
Technologeeks Avatar answered Oct 08 '22 11:10

Technologeeks