In LDD3, i saw such codes
static unsigned int scull_p_poll(struct file *filp, poll_table *wait)
{
    struct scull_pipe *dev = filp->private_data;
    unsigned int mask = 0;
    /*
     * The buffer is circular; it is considered full
     * if "wp" is right behind "rp" and empty if the
     * two are equal.
     */
    down(&dev->sem);
    poll_wait(filp, &dev->inq,  wait);
    poll_wait(filp, &dev->outq, wait);
    if (dev->rp != dev->wp)
        mask |= POLLIN | POLLRDNORM;    /* readable */
    if (spacefree(dev))
        mask |= POLLOUT | POLLWRNORM;   /* writable */
    up(&dev->sem);
    return mask;
}
But it says poll_wait won't wait and will return immediately. Then why do we need to call it? Why can't we just return mask?
When poll() is called for some file descriptor, the corresponding device poll_xyx() method registered with file operation structure is invoked in kernel space. This method then checks if the data is readily available, if this condition is true then the event mask is set and the poll returns to user space.
poll and select have essentially the same functionality: both allow a process to determine whether it can read from or write to one or more open files without blocking. They are thus often used in applications that must use multiple input or output streams without blocking on any one of them.
select() only uses (at maximum) three bits of data per file descriptor, while poll() typically uses 64 bits per file descriptor. In each syscall invoke poll() thus needs to copy a lot more over to kernel space.
If none of the events requested (and no error) has occurred for any of the file descriptors, then poll() blocks until one of the events occurs. The timeout argument specifies the number of milliseconds that poll() should block waiting for a file descriptor to become ready.
poll_wait adds your device (represented by the "struct file") to the list of those that can wake the process up.
The idea is that the process can use poll (or select or epoll etc) to add a bunch of file descriptors to the list on which it wishes to wait. The poll entry for each driver gets called. Each one adds itself (via poll_wait) to the waiter list.
Then the core kernel blocks the process in one place. That way, any one of the devices can wake up the process. If you return non-zero mask bits, that means those "ready" attributes (readable/writable/etc) apply now.
So, in pseudo-code, it's roughly like this:
foreach fd:
    find device corresponding to fd
    call device poll function to setup wait queues (with poll_wait) and to collect its "ready-now" mask
while time remaining in timeout and no devices are ready:
    sleep
return from system call (either due to timeout or to ready devices)
The poll file_operation sleeps if you return 0
This is what was confusing me.
When you return non-zero, it means that some event was fired, and it wakes up.
Once you see this, it is clear that something must be tying the process to the wait queue, and that thing is poll_wait.
Also remember that struct file represents "a connection between a process and an open file", not just a filesystem file, and as such it contains the pid, which is used to identify the process.
Playing with a minimal runnable example might also help clear things up: https://stackoverflow.com/a/44645336/895245
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With