Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can a Linux character device driver detect when a program using it exits abnormally?

I have a Linux character device driver that creates a /dev/mything entry, and then a C++/Qt program that opens the device and uses it. If that program exits correctly, with exit(), the device is closed and the driver properly resets itself. But if the program exits abnormally, via segfault or SIGINT or something, the device is not properly closed.

My current workaround is to reload the driver if it gets stuck in the "open" state.

This line in the driver tries to prevent multiple programs using the device simultaneously:

int mything_open( struct inode* inode, struct file* filp ) {
    ...
    if ( port->rings[bufcount].virt_addr ) return -EBUSY;
    ...
}

Then this cleans it up:

int mything_release( struct inode* inode, struct file* filp ) {
    ...
    port->rings[bufcount].virt_addr = NULL;
    ...
}

I think exit() is causing mything_release to be called but SIGINT is not. How can I make the driver more robust to this sort of situation?

EDIT:

Here are the operations I have implemented. Maybe I'm missing something?

static struct file_operations fatpipe_fops = {
    .owner =    THIS_MODULE,
    .open =     mything_open,
    .release =  mything_release,
    .read =     mything_read,
    .write =    mything_write,
    .ioctl =    mything_ioctl
};
like image 391
Dave Ceddia Avatar asked Jun 25 '12 14:06

Dave Ceddia


People also ask

What is character device driver in Linux?

Character device drivers normally perform I/O in a byte stream. Examples of devices using character drivers include tape drives and serial ports. Character device drivers can also provide additional interfaces not present in block drivers, such as I/O control (ioctl) commands, memory mapping, and device polling.

How do Linux device drivers work?

Device drivers make use of standard kernel services such as memory allocation, interrupt delivery and wait queues to operate, Loadable. Most of the Linux device drivers can be loaded on demand as kernel modules when they are needed and unloaded when they are no longer being used.

What are the two ways we can identify whether a device is a character device or a block device?

Block devices have a b as the first character of their file mode. Character devices have a c as the first character of their file mode. In this example, the block devices have blk in their names and the character devices have raw in their names.

How do you see what drivers are being used Linux?

You need to use the lsmod command, which show the status of loaded modules in the Linux Kernel. Linux kernel use a term modules for all hardware device drivers. This is an important task. With lsmod you can verify that device driver is loaded for particular hardware.


2 Answers

There is no need for this test; the problem is not abnormal program termination (which, from your driver's standpoint, is exactly like a normal close on the device) but instead a problem in the state keeping of your device. In other words, if you inserted close(dev_fd) or even exit(0) at the exact point where your program is crashing, you'd have the same problem.

You should figure out what part of your driver's behavior is causing it to remain in a busy state and fix that.

like image 156
R.. GitHub STOP HELPING ICE Avatar answered Oct 14 '22 20:10

R.. GitHub STOP HELPING ICE


The problem boiled down to this line in mything_release, put in to wait for some memory writes to complete:

if (wait_event_interruptible_timeout(port->inq, false, 10)) return -ERESTARTSYS;

With a normal program exit, this would spin for 10 jiffies and continue along. But with an abnormal exit from SIGINT or something, I think the interruptible timeout got interrupted and it returned -ERESTARTSYS, causing my if to return the same.

The thing that worked for me was to just get rid of the if and just wait:

wait_event_interruptible_timeout(port->inq, false, 10);

This patch from years ago made me believe that returning ERESTARTSYS from a close/_release function is not a good idea: http://us.generation-nt.com/answer/patch-fix-wrong-error-code-interrupted-close-syscalls-help-181191441.html

like image 44
Dave Ceddia Avatar answered Oct 14 '22 19:10

Dave Ceddia