Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to debug a Linux kernel that freezes during boot?

I have a legacy device with a binary Linux 2.6.18 kernel that boots normally to its rootfs. However, if I try to compile this kernel from the source, the resulting kernel binary will freeze during the boot. I don't have the .config file used to build the previous kernel binary that is currently booting normally.

The boot is freezing and no error output is provided. Here is the boot log:

Linux version 2.6.18-6.2 (myuser@host) (gcc version 4.2.0 20070124 (prerelease) - BRCM 10ts-20080721) #10 SMP Sun Apr 28 18:25:24 BRT 2013
Fetching vars from bootloader... OK (E,d,B,C)
Detected 512 MB on MEMC0 (strap 0x23430310)
Board strapped at 512 MB, default is 256 MB
Options: sata=1 enet=1 emac_1=1 no_mdio=0 docsis=0 ebi_war=0 pci=1 smp=1
CPU revision is: 0002a044
FPU revision is: 00130001
Primary instruction cache 32kB, physically tagged, 2-way, linesize 64 bytes.
Primary data cache 64kB, 4-way, linesize 64 bytes.
<6>Synthesized TLB refill handler (23 instructions).
<6>Synthesized TLB load handler fastpath (37 instructions).
<6>Synthesized TLB store handler fastpath (37 instructions).
<6>Synthesized TLB modify handler fastpath (36 instructions).
Determined physical RAM map:
 memory: 10000000 @ 00000000 (usable)
 memory: 10000000 @ 20000000 (usable)
Using 32MB for memory, overwrite by passing mem=xx
User-defined physical RAM map:
node [00000000, 02000000: RAM]
node [02000000, 0e000000: RSVD]
node [20000000, 10000000: RAM]
<5>Reserving 224 MB upper memory starting at 02000000
<7>On node 0 totalpages: 65536
<7>  DMA zone: 65536 pages, LIFO batch:15
<7>On node 1 totalpages: 65536
<7>  Normal zone: 65536 pages, LIFO batch:15
Built 2 zonelists.  Total pages: 131072
<5>Kernel command line: root=/dev/mtdblock3 rw rootfstype=jffs2 console=ttyS0,115200
PID hash table entries: 4096 (order: 12, 16384 bytes)
mips_counter_frequency = 202000000 from Calibration, = 202500000 from header(CPU_MHz/2)
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 286336k/524288k available (2924k kernel code, 237760k reserved, 544k data, 164k init, 0k highmem)
Mount-cache hash table entries: 512
Checking for 'wait' instruction...  available.
plat_prepare_cpus: ENABLING 2nd Thread...
TP0: prom_boot_secondary: Kick off 2nd CPU...
CPU revision is: 0002a044
FPU revision is: 00130001
Primary instruction cache 32kB, physically tagged, 2-way, linesize 64 bytes.
Primary data cache 64kB, 4-way, linesize 64 bytes.
Synthesized TLB refill handler (23 instructions).
Brought up 2 CPUs
migration_cost=1000
NET: Registered protocol family 16
registering PCI controller with io_map_base unset
registering PCI controller with io_map_base unset
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
NET: Registered protocol family 2
IP route cache hash table entries: 16384 (order: 4, 65536 bytes)
TCP established hash table entries: 65536 (order: 7, 524288 bytes)
TCP bind hash table entries: 32768 (order: 6, 262144 bytes)
TCP: Hash tables configured (established 65536 bind 32768)
TCP reno registered
brcm-pm: disabling power to USB block
brcm-pm: disabling power to ENET block
brcm-pm: disabling power to SATA block
squashfs: version 3.2-r2 (2007/01/15) Phillip Lougher
JFFS2 version 2.2. (NAND) (SUMMARY)  (C) 2001-2006 Red Hat, Inc.
io scheduler noop registered
io scheduler anticipatory registered (default)
io scheduler deadline registered
io scheduler cfq registered
Serial: 8250/16550 driver $Revision: 1.1.1.1 $ 3 ports, IRQ sharing disabled
serial8250: ttyS0 at MMIO 0x0 (irq = 22) is a 16550A
serial8250: ttyS1 at MMIO 0x0 (irq = 66) is a 16550A
serial8250: ttyS2 at MMIO 0x0 (irq = 67) is a 16550A
loop: loaded (max 8 devices)
brcm-pm: enabling power to ENET block

How do I go about debugging this? Any insights on possible solutions to the freeze are welcome as well.

like image 842
ivarec Avatar asked Apr 28 '13 21:04

ivarec


3 Answers

One way to deal with this is to enable CONFIG_EARLY_PRINTK and add some printk() statements in kernel code that you suspect is freezing (most likely some drivers configuration parameters are wrong).

Also, you might be able to get old kernel config by looking at /boot/config-*, or at /proc/config.gz (it will exist only if old kernel had option CONFIG_IKCONFIG_PROC enabled).

like image 67
mvp Avatar answered Oct 20 '22 20:10

mvp


Add initcall_debug to CONFIG_CMDLINE (kernel command line).

CONFIG_CMDLINE="root=/dev/ram0 rw mem=512M@0x0 initrd=0x800000,16M console=ttyS0,38400n8 rootfstype=ext2 init=/bin/busybox init -s initcall_debug"
like image 2
sailfish009 Avatar answered Oct 20 '22 19:10

sailfish009


There are some debugger options like kdb and kgdb, but I've always found them flaky and temperamental. Probably more-so if you can't even get your machine to boot. I concur with the CONFIG_EARLY_PRINTK advise, and would advise you to make sure you get kernel output on boot (not "quiet"), but it seems you have this already.

The "GPIO" suggestion above could work - but is very system-dependent and cumbersome. That said, I think you want an answer better than "Start adding a lot of printk's". You can start with the offending ethernet driver (BRC-PM?) or try removing that to see if that's related.

It'll take some investigation - sorry, but no "magic bullet"! :-O

like image 1
Brad Avatar answered Oct 20 '22 19:10

Brad