Edit: I have found <code>seq_file</code> that eases writing a lot of data from kernel to user-space. What I am looking for is the opposite; an API that facilitates reading a lot of data (more than one page) from user-space. Edit 2: I am implementing a port of <code><stdio.h></code> as a kernel module that would be able to open <code>/proc</code> (and later, other virtual file systems) similar to <code>FILE</code>s and handle input and output similar to <code><stdio.h></code>. You can find the project here. <hr> I have found a LOT of questions on how the kernel can write large amounts of data to /proc (for user-space programs to take), but nothing for the other way around. Let me elaborate: This question is basically about the algorithm by which the input is tokenized (for example to <code>int</code>s or a mixture of <code>int</code> and string etc), given that the data maybe broken between multiple buffers. For example, imagine the following data is being sent to the kernel module: <pre class="prettyprint"><code>12345678 81234567 78123456 67812345 5678 1234 45678123 3456 7812 23456781 </code></pre> and for the sake of this example, let's say the page size by which Linux feeds the /proc handler is 20 bytes (vs the real 4KB). The function that reads the data from /proc (in the kernel module) then sees the data as such: <pre class="prettyprint"><code>call 1: "12345678 81234567 78" call 2: "123456 67812345 5678" call 3: " 1234 45678123 3456 " call 4: "7812 23456781" </code></pre> As you can see, when <code>78</code> is read in the first call, it shouldn't be processed yet until the next frames for it to decide whether <code>78</code> was a whole number or one cut between frames. Now I found <code>seq_file</code>s that apparently are only for when the kernel wants to write data to user rather than read (or it could be that the HOWTO is horribly written). <h3>What I have done</h3> So far, I have come with the following solution (I am writing from memory, so I may miss a couple error checkings, but bear with me): In the initialization phase (say <code>init_module</code>): <pre class="prettyprint"><code>initialize mutex1 to 1 and mutex2 to 0 create /proc entry call data_processor </code></pre> /proc reader: <pre class="prettyprint"><code>1. down(mutex1) /* down_interruptible of course, but let's not get into details */ 2. copy_from_user to an internal buffer buffer_index = 0 data_length = whatever the size is 3. strip spaces from end of buffer (except if all left from buffer is 1 space) if so, there_was_space_after = 1 else 0 4. up(mutex2) </code></pre> I will explain why I strip spaces later <code>get_int</code> function: <pre class="prettyprint"><code>wait_for_next = 0 number_was_cut = 0 last_number = 0 do { 1. down(mutex2) 2. if (number_was_cut && !isdigit(buffer[buffer_index])) break /* turns out it wasn't really cut as beginning of next buffer is ' ' */ number_was_cut = 0 wait_for_next = 0 3. while (buffer_index < data_length && !isdigit(buffer_index[buffer_index])) ++buffer_index; /* skip white space */ 4. while (buffer_index < data_length && isdigit(buffer[buffer_index])) last_number = last_number * 10 + buffer[buffer_index++] - '0'; 5. if (buffer_index >= data_length && !there_was_space_after) number_was_cut = 1 wait_for_next = 1 up(mutex1) /* let more data come in */ else up(mutex2) /* let get_int continue */ break } while (wait_for_next) return last_number </code></pre> <code>data_processor</code> function (for example): <pre class="prettyprint"><code>int first_num = get_int() int sencod_num = get_int() for i = first_num to second_num do_whatever(get_int()) </code></pre> Explanation: First, see <code>data_processor</code>. It doesn't get involved in complications on how the data are read, so it just gets integers and does whatever it wants with them. Now let's see /proc reader. It basically waits for <code>data_processor</code> to call <code>get_int</code> enough times for all current data to be consumed (step 1) and then copies the next buffer into internal memory, allowing <code>data_processor</code> to continue (step 2). It then needs to strip trailing spaces so <code>get_int</code> could be simplified a bit (step 3). Finally, it signals <code>get_int</code> that it can start reading the data (step 4). The <code>get_int</code> function first waits for data to arrive (step 1), (ignore step 2 for now) it skips any unwanted characters (step 3) and then starts reading the number (step 4). The end of reading the number is by two possibilities; the end of buffer is reached (in which case, if /proc reader had not stripped any spaces, then the number could be cut between frames) or white space is met. In the former case, it needs to signal /proc reader to read in more data and wait for another cycle to append the rest of the number to the current one and in the later case, it returns the number (step 5). If continuing from last frame, check to see if new frame starts with a number or not. If not, then previous number was actually a whole number and should be returned. Otherwise, it needs to continue appending digits to last number (step 2). <h3>Problem</h3> The main problem with this method is that it is overly complicated. It gets much more complicated when <code>get_string</code> is added, or the read integer could be hex etc. Basically, you have to reinvent <code>sscanf</code>! Note that, <code>sscanf</code> could be used in this simple example at step 4 of <code>get_int</code> instead of the <code>while</code> loop (or also with <code>get_string</code>, but that gets more tricky when hex input is also possible (imagine the hex number being cut between 0 and x0212ae4). Even so, it just replaces step 4 of <code>get_int</code> and the rest of the stuff should still remain. It actually got me many bugs and heavy testing to perfect all the special cases. That's another reason why it doesn't look elegant to me. <h3>Questions</h3> I would like to know if there is any better method to handle this. I am aware that using shared memory could be an option, but I'm looking for an algorithm for this task (more out of curiosity since I already have my working solution). More specifically: <ul> <li>Is there an already implemented method in the Linux kernel that can be treated like a normal C <code>FILE</code> from which you can take data and it handles the breaking of data into pages itself?</li> <li>If no, am I over-complicating things and am I missing an obvious simple solution?</li> <li>I believe <code>fscanf</code> faces a similar problem. How is this handled by that?</li> </ul> Side question: Is it a terrible thing that I'm blocking the /proc reader on a mutex? I mean, writing data can be blocking, but I'm not sure if that normally happens in user-space or kernel-space.

The request_firmware() interface may be of interest to you; the whole thing gets buffered by the kernel before it's handed to you. Otherwise, maybe the sysfs binary attributes interface is more useful than proc?

How to parse large amount of data passed to kernel module through /proc file?

Tags:

c

linux

buffer

kernel

procfs

Edit: I have found seq_file that eases writing a lot of data from kernel to user-space. What I am looking for is the opposite; an API that facilitates reading a lot of data (more than one page) from user-space.

Edit 2: I am implementing a port of <stdio.h> as a kernel module that would be able to open /proc (and later, other virtual file systems) similar to FILEs and handle input and output similar to <stdio.h>. You can find the project here.

I have found a LOT of questions on how the kernel can write large amounts of data to /proc (for user-space programs to take), but nothing for the other way around. Let me elaborate:

This question is basically about the algorithm by which the input is tokenized (for example to ints or a mixture of int and string etc), given that the data maybe broken between multiple buffers.

For example, imagine the following data is being sent to the kernel module:

12345678 81234567 78123456 67812345 5678 1234 45678123 3456 7812 23456781

and for the sake of this example, let's say the page size by which Linux feeds the /proc handler is 20 bytes (vs the real 4KB).

The function that reads the data from /proc (in the kernel module) then sees the data as such:

call 1:
"12345678 81234567 78"
call 2:
"123456 67812345 5678"
call 3:
" 1234 45678123 3456 "
call 4:
"7812 23456781"

As you can see, when 78 is read in the first call, it shouldn't be processed yet until the next frames for it to decide whether 78 was a whole number or one cut between frames.

Now I found seq_files that apparently are only for when the kernel wants to write data to user rather than read (or it could be that the HOWTO is horribly written).

What I have done

So far, I have come with the following solution (I am writing from memory, so I may miss a couple error checkings, but bear with me):

In the initialization phase (say init_module):

initialize mutex1 to 1 and mutex2 to 0
create /proc entry
call data_processor

/proc reader:

1. down(mutex1)    /* down_interruptible of course, but let's not get into details */

2. copy_from_user to an internal buffer
   buffer_index = 0
   data_length = whatever the size is

3. strip spaces from end of buffer (except if all left from buffer is 1 space)
   if so, there_was_space_after = 1 else 0

4. up(mutex2)

I will explain why I strip spaces later

get_int function:

wait_for_next = 0
number_was_cut = 0
last_number = 0

do
{
    1. down(mutex2)

    2. if (number_was_cut && !isdigit(buffer[buffer_index]))
           break     /* turns out it wasn't really cut
                        as beginning of next buffer is ' ' */
       number_was_cut = 0
       wait_for_next = 0

    3. while (buffer_index < data_length && !isdigit(buffer_index[buffer_index]))
           ++buffer_index;    /* skip white space */

    4. while (buffer_index < data_length && isdigit(buffer[buffer_index]))
           last_number = last_number * 10 + buffer[buffer_index++] - '0';

    5. if (buffer_index >= data_length && !there_was_space_after)
           number_was_cut = 1
           wait_for_next = 1
           up(mutex1)         /* let more data come in */
       else
           up(mutex2)         /* let get_int continue */
           break
} while (wait_for_next)

return last_number

data_processor function (for example):

int first_num = get_int()
int sencod_num = get_int()
for i = first_num to second_num
    do_whatever(get_int())

Explanation: First, see data_processor. It doesn't get involved in complications on how the data are read, so it just gets integers and does whatever it wants with them. Now let's see /proc reader. It basically waits for data_processor to call get_int enough times for all current data to be consumed (step 1) and then copies the next buffer into internal memory, allowing data_processor to continue (step 2). It then needs to strip trailing spaces so get_int could be simplified a bit (step 3). Finally, it signals get_int that it can start reading the data (step 4).

The get_int function first waits for data to arrive (step 1), (ignore step 2 for now) it skips any unwanted characters (step 3) and then starts reading the number (step 4). The end of reading the number is by two possibilities; the end of buffer is reached (in which case, if /proc reader had not stripped any spaces, then the number could be cut between frames) or white space is met. In the former case, it needs to signal /proc reader to read in more data and wait for another cycle to append the rest of the number to the current one and in the later case, it returns the number (step 5). If continuing from last frame, check to see if new frame starts with a number or not. If not, then previous number was actually a whole number and should be returned. Otherwise, it needs to continue appending digits to last number (step 2).

Problem

The main problem with this method is that it is overly complicated. It gets much more complicated when get_string is added, or the read integer could be hex etc. Basically, you have to reinvent sscanf! Note that, sscanf could be used in this simple example at step 4 of get_int instead of the while loop (or also with get_string, but that gets more tricky when hex input is also possible (imagine the hex number being cut between 0 and x0212ae4). Even so, it just replaces step 4 of get_int and the rest of the stuff should still remain.

It actually got me many bugs and heavy testing to perfect all the special cases. That's another reason why it doesn't look elegant to me.

Questions

I would like to know if there is any better method to handle this. I am aware that using shared memory could be an option, but I'm looking for an algorithm for this task (more out of curiosity since I already have my working solution). More specifically:

Is there an already implemented method in the Linux kernel that can be treated like a normal C FILE from which you can take data and it handles the breaking of data into pages itself?
If no, am I over-complicating things and am I missing an obvious simple solution?
I believe fscanf faces a similar problem. How is this handled by that?

Side question: Is it a terrible thing that I'm blocking the /proc reader on a mutex? I mean, writing data can be blocking, but I'm not sure if that normally happens in user-space or kernel-space.

995

asked Mar 22 '12 22:03

Shahbaz

1 Answers

The request_firmware() interface may be of interest to you; the whole thing gets buffered by the kernel before it's handed to you.

Otherwise, maybe the sysfs binary attributes interface is more useful than proc?

185

answered Oct 03 '22 09:10

blueshift

Related questions
                            
                                How would you go about designing a function for a perfect hash?
                            
                                Is there a dev kit/lib (written in c or c++) to write docx files? [closed]
                            
                                Using libcurl to upload files to DropBox
                            
                                Why is lua on host system slower than in the linux vm?
                            
                                Use of small integer with bits operator in C
                            
                                What is the result of `strtod("3ex", &end)` supposed to be? What about `sscanf`?
                            
                                Why is my buffer length ignored?
                            
                                Why does a printf() allow this double to be passed by pointer?
                            
                                How to compile a C program for Genymotion (Android x86)
                            
                                Easiest way to use DMA in Linux
                            
                                Memory access monitor for c programs
                            
                                Send channel request from SSH server making `ssh_channel_accept_forward()` return on SSH client?
                            
                                Passing a JavaScript array of strings to a C function with Emscripten
                            
                                C11 Atomic Acquire/Release and x86_64 lack of load/store coherence?
                            
                                event based openssl bio
                            
                                Why there is a number 22 in GCC's implementation of a VLA(variable-length array)?
                            
                                Better way to implement a generic atomic load or store in GCC?
                            
                                Marshalling a char** in C#
                            
                                What is the simplest parsing algorithm that can parse C code?
                            
                                How do I write an image into an SVG file using cairo?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With