Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why empty critical section within netfilter hooks, occurs `BUG: scheduling while atomic error`?

I've written this hook:

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/netfilter.h>
#include <linux/netfilter_ipv4.h>
#include <linux/skbuff.h>
#include <linux/mutex.h>

static struct nf_hook_ops nfho;
static struct mutex critical_section;

unsigned int hook_func(unsigned int hooknum,
   struct sk_buff **skb,
   const struct net_device *in,
   const struct net_device *out,
   int (*okfn)(struct sk_buff *)) {

  mutex_lock(&critical_section);

  mutex_unlock(&critical_section);

  return NF_ACCEPT;
}

int init_module() {

  nfho.hook = hook_func;
  nfho.hooknum = NF_INET_PRE_ROUTING;
  nfho.pf = PF_INET;
  nfho.priority = NF_IP_PRI_FIRST;

  mutex_init(&critical_section);

  nf_register_hook(&nfho);

  return 0;
}

void cleanup_module() {
  nf_unregister_hook(&nfho);
}

init section:

  mutex_init(&queue_critical_section);
  mutex_init(&ioctl_critical_section);

I have defined static variable:

static struct mutex queue_critical_section;

As there is no code between lock and unlock I expect no error, but when I run this module, the kernel produce these errors:

Error Updated:

root@khajavi: # pppd call 80-2
[  519.722190] PPP generic driver version 2.4.2
root@khajavi:# [  519.917390] BUG: scheduling while atomic: swapper/0/0/0x10000100
[  519.940933] Modules linked in: ppp_async crc_ccitt ppp_generic slhc netfilter_mutex(P) nls_utf8 isofs udf crc_itu_t bnep    rfcomm bluetooth rfkill vboxsf(O) vboxvideo(O) drm]
[  520.022203] CPU 0 
[  520.023270] Modules linked in: ppp_async crc_ccitt ppp_generic slhc netfilter_mutex(P) nls_utf8 isofs udf crc_itu_t bnep rfcomm bluetooth rfkill vboxsf(O) vboxvideo(O) drm]
[  520.087002] 
[  520.088001] Pid: 0, comm: swapper/0 Tainted: P           O 3.2.51 #3 innotek GmbH VirtualBox/VirtualBox
[  520.130047] RIP: 0010:[<ffffffff8102d17d>]  [<ffffffff8102d17d>] native_safe_halt+0x6/0x8
[  520.135010] RSP: 0018:ffffffff81601ee8  EFLAGS: 00000246
[  520.140999] RAX: 0000000000000000 RBX: ffffffff810a4cfa RCX: ffffffffffffffbe
[  520.145972] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001
[  520.158759] RBP: ffffffff81601ee8 R08: 0000000000000000 R09: 0000000000000000
[  520.163392] R10: 0000000000000400 R11: ffff88003fc13680 R12: 0000000000014040
[  520.172784] R13: ffff88003fc14040 R14: ffffffff81067fd2 R15: ffffffff81601e58
[  520.177767] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[  520.188208] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  520.196486] CR2: 00007fff961a3f40 CR3: 0000000001605000 CR4: 00000000000006f0
[  520.201437] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  520.212332] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  520.217155] Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff8160d020)
[  520.228706] Stack:
[  520.234394]  ffffffff81601ef8
Message from syslogd@khajavi at Dec 22 17:45:46 ...
 kernel:[  520.228706] Stack:
 ffffffff81014857 ffffffff81601f28 ffffffff8100d2a3
[  520.255069]  ffffffffffffffff 0d64eb669fae50fc ffffffff81601f28 0000000000000000
[  520.269238]  ffffffff81601f38 ffffffff81358c39 ffffffff81601f78 ffffffff816acb8a
[  520.274148] Call Trace:
[  520.275573]  [<ffffffff81014857>] default_idle+0x49/0x81
[  520.278985]  [<ffffffff8100d2a3>] cpu_idle+0xbc/0xcf
[  520.291491]  [<ffffffff81358c39>] rest_init+0x6d/0x6f

here is the complete syslog error: http://paste.ubuntu.com/6617614/

like image 687
Milad Khajavi Avatar asked Nov 30 '13 09:11

Milad Khajavi


3 Answers

This is a hook from inside the kernel. Sleeping, locking a semaphore (pend) or any blocking operations are not allowed; You are locking the kernel!

If you want synchronization object, you might try to use spin locks.

As this answer to similar question stated, mutex_lock will trigger scheduler; But the kernel will be puzzled because you are trying to schedule another task, while you are in critical section (the callback itself is big critical section).

Check this thread Understanding execution context of netfilter hooks for similar case.

like image 196
Yousf Avatar answered Nov 15 '22 17:11

Yousf


Even though mutex_lock() probably won't sleep in this case, it still might sleep. Since this is called in an atomic context, the error is raised.

Specifically, this is caused by mutex_lock() calling might_sleep(), which in turn may call __schedule()

If you do need to synchronize, use the appropriate calls, eg. spinlocks and rcu.

like image 36
Hasturkun Avatar answered Nov 15 '22 16:11

Hasturkun


You see this message if your task scheduled when it holds an synchro, most probably a spinlock. When you lock a spinlock it increases preempt_count; when the scheduler detests the situation of scheduling with increased preempt_count it prints out the exactly that message:

/* * Print scheduling while atomic bug:

 */
static noinline void __schedule_bug(struct task_struct *prev)
{
        if (oops_in_progress)
                return;

        printk(KERN_ERR "BUG: scheduling while atomic: %s/%d/0x%08x\n",
                prev->comm, prev->pid, preempt_count());

        debug_show_held_locks(prev);
        print_modules();
        if (irqs_disabled())
                print_irqtrace_events(prev);
        dump_stack();
}

So, probably you are holding a lock or you must to unlock some lock.

PS. From the mutex description in Linux documentation:

  • 'struct mutex' semantics are well-defined and are enforced if CONFIG_DEBUG_MUTEXES is turned on. Semaphores on the other hand have
    virtually no debugging code or instrumentation. The mutex subsystem
    checks and enforces the following rules:

      • only one task can hold the mutex at a time * - only the owner can unlock the mutex * - multiple unlocks are not permitted
      • recursive locking is not permitted * - a mutex object must be initialized via the API * - a mutex object must not be initialized via memset or copying * - task may not exit with mutex held * - memory areas where held locks reside must not be freed * - held mutexes must not be reinitialized * - mutexes may not be used in hardware or software interrupt * contexts such as tasklets and timers

In your design the same mutex could be used several times simultaneously:

  1. call 1 -> you code -> mutex_lock
  2. scheduler interrupts your code.
  3. call 2 -> your code -> mutex_lock (already locked) -> BUG

Good luck.

like image 32
Sebastian Mountaniol Avatar answered Nov 15 '22 15:11

Sebastian Mountaniol