Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Isolate Kernel Module to a Specific Core Using Cpuset

Tags:

From user-space we can use cpuset to actually isolate a specific core in our system and execute just one specific process to that core.

I'm trying to do the same thing with a kernel module. So I want the module to get executed in an isolated core. In other words: How do I use cpuset's from inside a kernel module? *

Using linux/cpuset.h in my kernel module doesn't work. So, I have a module like this:

#include <linux/module.h>
#include <linux/cpuset.h>

...
#ifdef CONFIG_CPUSETS
    printk(KERN_INFO, "cpusets is enabled!");
#endif
cpuset_init(); // this function is declared in cpuset.h
...

When trying to load this module I get (in dmesg) the following message cpusets is enabled!. But I also receive the message Unknown symbol cpu_init (err 0).

Similarly, I tried using sched_setaffinity from linux/sched.h in order to move all running procceses to a specific core and then run my module to an isolated core. I got the same error mesage: Unknown symbol sched_setaffinity (err 0). I guess I got the "unknown symbols" because those functions have no EXPORT_SYMBOL in the kernel. So I went and tried to call the sys_sched_setaffinity system call (based on this question) but again got this mesage: Unknown symbol sys_sched_setaffinity (err 0)!

Furthermore, I am not looking for a solution that uses isolcpus, which is set while booting. I would like to just load the module and afterwards the isolationo to occur.

  • (More precise, I want its kernel threads to execute in isolated cores. I know that I can use affinity to bind threads to specific cores, but this does not guarantee me that the cores are going to be isolated by other processes running on them.)
like image 463
insumity Avatar asked Mar 29 '16 15:03

insumity


People also ask

How use Cpuset Linux?

If a system supports cpusets, then it will have the entry nodev cpuset in the file /proc/filesystems. By mounting the cpuset filesystem (see the EXAMPLES section below), the administrator can configure the cpusets on a system to control the processor and memory placement of processes on that system.

How do I block a kernel module?

To blacklist a kernel module permanently via GRUB, open the /etc/default/grub file for editing, and add the modprobe. blacklist=MODULE_NAME option to the GRUB_CMD_LINUX command. Then run the sudo grub2-mkconfig -o /boot/grub2/grub. cfg command to enable the changes.

What is Isolcpu?

The isolcpus kernel parameter designates a set of CPUs where the process scheduler will ignore those CPUs. That is to say, the process scheduler will not include those CPUs as targets to put processes on, migrate processes onto or off of, or take a process off a CPU.


2 Answers

So I want the module to get executed in an isolated core.

and

actually isolate a specific core in our system and execute just one specific process to that core

This is a working source code compiled and tested on a Debian box using kernel 3.16. I'll describe how to load and unload first and what the parameter passed means.

All sources can be found on github here...

https://github.com/harryjackson/doc/tree/master/linux/kernel/toy/toy

Build and load the module...

make
insmod toy param_cpu_id=2

To unload the module use

rmmod toy

I'm not using modprobe because it expects some configuration etc. The parameter we're passing to the toy kernel module is the CPU we want to isolate. None of the device operations that get called will run unless they're executing on that CPU.

Once the module is loaded you can find it here

/dev/toy

Simple operations like

cat /dev/toy

create events that the kernel module catches and produces some output. You can see the output using dmesg.

Source code...

#include <linux/module.h>
#include <linux/fs.h>
#include <linux/miscdevice.h>
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Harry");
MODULE_DESCRIPTION("toy kernel module");
MODULE_VERSION("0.1"); 
#define  DEVICE_NAME "toy"
#define  CLASS_NAME  "toy"

static int    param_cpu_id;
module_param(param_cpu_id    , int, (S_IRUSR | S_IRGRP | S_IROTH));
MODULE_PARM_DESC(param_cpu_id, "CPU ID that operations run on");

//static void    bar(void *arg);
//static void    foo(void *cpu);
static int     toy_open(   struct inode *inodep, struct file *fp);
static ssize_t toy_read(   struct file *fp     , char *buffer, size_t len, loff_t * offset);
static ssize_t toy_write(  struct file *fp     , const char *buffer, size_t len, loff_t *);
static int     toy_release(struct inode *inodep, struct file *fp);

static struct file_operations toy_fops = {
  .owner = THIS_MODULE,
  .open = toy_open,
  .read = toy_read,
  .write = toy_write,
  .release = toy_release,
};

static struct miscdevice toy_device = {
  .minor = MISC_DYNAMIC_MINOR,
  .name = "toy",
  .fops = &toy_fops
};

//static int CPU_IDS[64] = {0};
static int toy_open(struct inode *inodep, struct file *filep) {
  int this_cpu = get_cpu();
  printk(KERN_INFO "open: called on CPU:%d\n", this_cpu);
  if(this_cpu == param_cpu_id) {
    printk(KERN_INFO "open: is on requested CPU: %d\n", smp_processor_id());
  }
  else {
    printk(KERN_INFO "open: not on requested CPU:%d\n", smp_processor_id());
  }
  put_cpu();
  return 0;
}
static ssize_t toy_read(struct file *filep, char *buffer, size_t len, loff_t *offset){
  int this_cpu = get_cpu();
  printk(KERN_INFO "read: called on CPU:%d\n", this_cpu);
  if(this_cpu == param_cpu_id) {
    printk(KERN_INFO "read: is on requested CPU: %d\n", smp_processor_id());
  }
  else {
    printk(KERN_INFO "read: not on requested CPU:%d\n", smp_processor_id());
  }
  put_cpu();
  return 0;
}
static ssize_t toy_write(struct file *filep, const char *buffer, size_t len, loff_t *offset){
  int this_cpu = get_cpu();
  printk(KERN_INFO "write called on CPU:%d\n", this_cpu);
  if(this_cpu == param_cpu_id) {
    printk(KERN_INFO "write: is on requested CPU: %d\n", smp_processor_id());
  }
  else {
    printk(KERN_INFO "write: not on requested CPU:%d\n", smp_processor_id());
  }
  put_cpu();
  return 0;
}
static int toy_release(struct inode *inodep, struct file *filep){
  int this_cpu = get_cpu();
  printk(KERN_INFO "release called on CPU:%d\n", this_cpu);
  if(this_cpu == param_cpu_id) {
    printk(KERN_INFO "release: is on requested CPU: %d\n", smp_processor_id());
  }
  else {
    printk(KERN_INFO "release: not on requested CPU:%d\n", smp_processor_id());
  }
  put_cpu();
  return 0;
}

static int __init toy_init(void) {
  int cpu_id;
  if(param_cpu_id < 0 || param_cpu_id > 4) {
    printk(KERN_INFO "toy: unable to load module without cpu parameter\n");
    return -1;
  }
  printk(KERN_INFO "toy: loading to device driver, param_cpu_id: %d\n", param_cpu_id);
  //preempt_disable(); // See notes below
  cpu_id = get_cpu();
  printk(KERN_INFO "toy init called and running on CPU: %d\n", cpu_id);
  misc_register(&toy_device);
  //preempt_enable(); // See notes below
  put_cpu();
  //smp_call_function_single(1,foo,(void *)(uintptr_t) 1,1);
  return 0;
}

static void __exit toy_exit(void) {
    misc_deregister(&toy_device);
    printk(KERN_INFO "toy exit called\n");
}

module_init(toy_init);
module_exit(toy_exit); 

The code above contains the two methods you asked for ie isolation of CPU and on init run on an isolated core.

On init get_cpu disables preemption ie anything that comes after it will not be preempted by the kernel and will run on one core. Note, this was done kernel using 3.16, your mileage may vary depending on your kernel version but I think these API's have been around a long time

This is the Makefile...

obj-m += toy.o

all:
    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

clean:
    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

Notes. get_cpu is declared in linux/smp.h as

#define get_cpu()   ({ preempt_disable(); smp_processor_id(); })
#define put_cpu()   preempt_enable()

so you don't actually need to call preempt_disable before calling get_cpu. The get_cpu call is a wrapper around the following sequence of calls...

preempt_count_inc();
barrier();

and put_cpu is really doing this...

barrier();
if (unlikely(preempt_count_dec_and_test())) {
  __preempt_schedule();
}   

You can get as fancy as you like using the above. Almost all of this was taken from the following sources..

Google for... smp_call_function_single

Linux Kernel Development, book by Robert Love.

http://derekmolloy.ie/writing-a-linux-kernel-module-part-2-a-character-device/

https://github.com/vsinitsyn/reverse/blob/master/reverse.c

like image 110
Harry Avatar answered Oct 18 '22 12:10

Harry


You pointed in your question:

I guess I got the "unknown symbols" because those functions have no EXPORT_SYMBOL in the kernel

I think this is the key point of your problem. I see you're including the file linux/cpuset.h which defines the method: cpuset_init among others. However, both during compilation and using the command nm we can see indicators pointing us that this function is not available:

Compiling:

root@hectorvp-pc:/home/hectorvp/cpuset/cpuset_try# make
make -C /lib/modules/3.19.0-31-generic/build M=/home/hectorvp/cpuset/cpuset_try modules 
make[1]: Entering directory '/usr/src/linux-headers-3.19.0-31-generic'
  CC [M]  /home/hectorvp/cpuset/cpuset_try/cpuset_try.o
  Building modules, stage 2. 
  MODPOST 1 modules 
  WARNING: "cpuset_init" [/home/hectorvp/cpuset/cpuset_try/cpuset_try.ko] undefined!
  CC      /home/hectorvp/cpuset/cpuset_try/cpuset_try.mod.o
  LD [M]  /home/hectorvp/cpuset/cpuset_try/cpuset_try.ko
make[1]: Leaving directory '/usr/src/linux-headers-3.19.0-31-generic'

See the WARNING: "cupset_init" [...] undefined!. And using nm:

root@hectorvp-pc:/home/hectorvp/cpuset/cpuset_try# nm cpuset_try.ko
0000000000000030 T cleanup_module
                 U cpuset_init
                 U __fentry__
0000000000000000 T init_module
000000000000002f r __module_depends
                 U printk
0000000000000000 D __this_module
0000000000000000 r __UNIQUE_ID_license0
000000000000000c r __UNIQUE_ID_srcversion1
0000000000000038 r __UNIQUE_ID_vermagic0
0000000000000000 r ____versions

(Note: U stands for 'undefined')

However, I've been exploring the kernel's symbols as follow:

root@hectorvp-pc:/home/hectorvp/cpuset/cpuset_try# cat /proc/kallsyms | grep cpuset_init
ffffffff8110dc40 T cpuset_init_current_mems_allowed
ffffffff81d722ae T cpuset_init
ffffffff81d72342 T cpuset_init_smp

I see it's exported but it isn't available in /lib/modules/$(uname -r)/build/Module.symvers. So you're right.

After further investigation I found it's actually defined in:

http://lxr.free-electrons.com/source/kernel/cpuset.c#L2101

This is the function you need to call as it is available in the kernel space. Thus you won't need access to the user space.

The work around I found to make the module able to call this symbols is reported in the second answer of this question. Notice that you don't need to include linux/cpuset.h anymore:

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
//#include <linux/cpuset.h>
#include <linux/kallsyms.h>


int init_module(void)
{
        static void (*cpuset_init_p)(void);
        cpuset_init_p = (void*) kallsyms_lookup_name("cpuset_init");
        printk(KERN_INFO "Starting ...\n");
        #ifdef CONFIG_CPUSETS
            printk(KERN_INFO "cpusets is enabled!");
        #endif
        (*cpuset_init_p)();
        /* 
         * A non 0 return means init_module failed; module can't be loaded. 
         */
        return 0;
}

void cleanup_module(void)
{
        printk(KERN_INFO "Ending ...\n");
}

MODULE_LICENSE("GPL");

I compiled it successfully and installed with insmod. Bellow is the output I got in dmesg:

[ 1713.738925] Starting ...
[ 1713.738929] cpusets is enabled!
[ 1713.738943] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[ 1713.739042] BUG: unable to handle kernel paging request at ffffffff81d7237b
[ 1713.739074] IP: [<ffffffff81d7237b>] cpuset_init+0x0/0x94
[ 1713.739102] PGD 1c16067 PUD 1c17063 PMD 30bc74063 PTE 8000000001d72163
[ 1713.739136] Oops: 0011 [#1] SMP 
[ 1713.739153] Modules linked in: cpuset_try(OE+) xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables x_tables nf_nat nf_conntrack br_netfilter bridge stp llc pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) aufs binfmt_misc cfg80211 nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek intel_rapl snd_hda_codec_generic iosf_mbi snd_hda_intel x86_pkg_temp_thermal intel_powerclamp snd_hda_controller snd_hda_codec snd_hwdep coretemp kvm_intel amdkfd kvm snd_pcm snd_seq_midi snd_seq_midi_event amd_iommu_v2 snd_rawmidi radeon snd_seq crct10dif_pclmul crc32_pclmul snd_seq_device aesni_intel ttm aes_x86_64 drm_kms_helper drm snd_timer i2c_algo_bit dcdbas mei_me lrw gf128mul mei snd glue_helper ablk_helper
[ 1713.739533]  cryptd soundcore shpchp lpc_ich serio_raw 8250_fintek mac_hid video parport_pc ppdev lp parport autofs4 hid_generic usbhid hid e1000e ahci psmouse ptp libahci pps_core
[ 1713.739628] CPU: 2 PID: 24679 Comm: insmod Tainted: G           OE  3.19.0-56-generic #62-Ubuntu
[ 1713.739663] Hardware name: Dell Inc. OptiPlex 9020/0PC5F7, BIOS A03 09/17/2013
[ 1713.739693] task: ffff8800d29f09d0 ti: ffff88009177c000 task.ti: ffff88009177c000
[ 1713.739723] RIP: 0010:[<ffffffff81d7237b>]  [<ffffffff81d7237b>] cpuset_init+0x0/0x94
[ 1713.739757] RSP: 0018:ffff88009177fd10  EFLAGS: 00010292
[ 1713.739779] RAX: 0000000000000013 RBX: ffffffff81c1a080 RCX: 0000000000000013
[ 1713.739808] RDX: 000000000000c928 RSI: 0000000000000246 RDI: 0000000000000246
[ 1713.739836] RBP: ffff88009177fd18 R08: 000000000000000a R09: 00000000000003db
[ 1713.739865] R10: 0000000000000092 R11: 00000000000003db R12: ffff8800ad1aaee0
[ 1713.739893] R13: 0000000000000000 R14: ffffffffc0947000 R15: ffff88009177fef8
[ 1713.739923] FS:  00007fbf45be8700(0000) GS:ffff88031dd00000(0000) knlGS:0000000000000000
[ 1713.739955] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1713.739979] CR2: ffffffff81d7237b CR3: 00000000a3733000 CR4: 00000000001407e0
[ 1713.740007] Stack:
[ 1713.740016]  ffffffffc094703e ffff88009177fd98 ffffffff81002148 0000000000000001
[ 1713.740052]  0000000000000001 ffff8802479de200 0000000000000001 ffff88009177fd78
[ 1713.740087]  ffffffff811d79e9 ffffffff810fb058 0000000000000018 ffffffffc0949000
[ 1713.740122] Call Trace:
[ 1713.740137]  [<ffffffffc094703e>] ? init_module+0x3e/0x50 [cpuset_try]
[ 1713.740175]  [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[ 1713.740190]  [<ffffffff811d79e9>] ? kmem_cache_alloc_trace+0x189/0x200
[ 1713.740207]  [<ffffffff810fb058>] ? load_module+0x15b8/0x1d00
[ 1713.740222]  [<ffffffff810fb092>] load_module+0x15f2/0x1d00
[ 1713.740236]  [<ffffffff810f6850>] ? store_uevent+0x40/0x40
[ 1713.740250]  [<ffffffff810fb916>] SyS_finit_module+0x86/0xb0
[ 1713.740265]  [<ffffffff817ce10d>] system_call_fastpath+0x16/0x1b
[ 1713.740280] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0c 53 58 31 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 1c 00 00 00 c0 92 2c 7d c0 92 2c 7d a0 fc 69 ee 
[ 1713.740398] RIP  [<ffffffff81d7237b>] cpuset_init+0x0/0x94
[ 1713.740413]  RSP <ffff88009177fd10>
[ 1713.740421] CR2: ffffffff81d7237b
[ 1713.746177] ---[ end trace 25614103c0658b94 ]---

Despite the errors, I'd say I've answered your initial question:

How do I use cpuset's from inside a kernel module? *

Probably not in the most elegant way as I'm not an expert at all. You need to continue from here.

Regards

like image 28
Héctor Valverde Pareja Avatar answered Oct 18 '22 11:10

Héctor Valverde Pareja