Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is glibc's sscanf vastly slower than fscanf on Linux?

I am using GCC 4.8 and glibc 2.19 on an x86_64 Linux.

While playing with different input methods for a different question, I compared fscanf and sscanf. Specifically, I would either use fscanf on the standard input directly:

char s[128]; int n;  while (fscanf(stdin, "%127s %d", s, &n) == 2) { } 

Or I would first read the entire input into a buffer and then traverse the buffer with sscanf. (Reading everything into the buffer takes a tiny amount of time.)

char s[128]; int n; char const * p = my_data;  for (int b; sscanf(p, "%127s %d%n", s, &n, &b) == 2; p += b) { } 

To my surprise, the fscanf version is vastly faster. For example, processing seveal tens of thousands of lines with fscanf takes this long:

10000       0.003927487 seconds time elapsed 20000       0.006860206 seconds time elapsed 30000       0.007933329 seconds time elapsed 40000       0.012881912 seconds time elapsed 50000       0.013516816 seconds time elapsed 60000       0.015670432 seconds time elapsed 70000       0.017393129 seconds time elapsed 80000       0.019837480 seconds time elapsed 90000       0.023925753 seconds time elapsed 

Now the same with sscanf:

10000       0.035864643 seconds time elapsed 20000       0.127150772 seconds time elapsed 30000       0.319828373 seconds time elapsed 40000       0.611551668 seconds time elapsed 50000       0.919187459 seconds time elapsed 60000       1.327831544 seconds time elapsed 70000       1.809843039 seconds time elapsed 80000       2.354809588 seconds time elapsed 90000       2.970678416 seconds time elapsed 

I was using the Google perf tools to measure this. For example, for 50000 lines, the fscanf code requires about 50M cycles, and the sscanf code about 3300M cycles. So I broke down the top call sites with perf record/perf report. With fscanf:

 35.26%  xf  libc-2.19.so         [.] _IO_vfscanf  23.91%  xf  [kernel.kallsyms]    [k] 0xffffffff8104f45a   8.93%  xf  libc-2.19.so         [.] _int_malloc 

And with sscanf:

 98.22%  xs  libc-2.19.so         [.] rawmemchr   0.68%  xs  libc-2.19.so         [.] _IO_vfscanf   0.38%  xs  [kernel.kallsyms]    [k] 0xffffffff8104f45a 

So almost all of the time with sscanf is spent in rawmemchr! Why is this? How can the fscanf code avoid this cost?

I tried searching for this, but the best I could come up with is this discussion of locked realloc calls which I don't think applies here. I was also thinking that fscanf has better memory locality (using the same buffer over and over), but that can't make such a big difference.

Does anyone have any insights in this strange discrepancy?

like image 373
Kerrek SB Avatar asked May 29 '14 00:05

Kerrek SB


1 Answers

sscanf() converts the string you pass in to an _IO_FILE* to make the string look like a "file". This is so the same internal _IO_vfscanf() can be used for both a string and a FILE*.

However, as part of that conversion, done in a _IO_str_init_static_internal() function, it calls __rawmemchr (ptr, '\0'); essentially a strlen() call, on your input string. This conversion is done on every call to sscanf(), and since your input buffer is rather large, it'll spend a fair amount of time calculating the length of the input string.

Creating a FILE* from the input string using fmemopen() and use fscanf() could be another alternative.

like image 199
nos Avatar answered Sep 29 '22 11:09

nos