Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Consistency and behavior of %n operator in *scanf

I'm currently building a bit of HTTP handling into a C program (compiled using glibc on Linux), which will sit behind an nginx instance, and figured I should be safe deferring argument tokenization to sscanf in this scenario.

I was very pleased to find that extracting the query out of the URI was straightforward:

char *path = "/events?a=1&b=2&c=3";

char query[64] = {0};

sscanf(path, "%*[^?]?%64s HTTP", query); // query = "a=1&b=2&c=3"

but I was surprised how quickly things became i͏̠͚̣̗̲n͓̭̞̹t͈e҉̝̟̘̺r͈e̫st̩̟̠i͏͈͇n͏̠͍g̞͝   :(

int pos = -1;
char arg[32] = {0}, value[32] = {0};

int c = sscanf(query, "%32[^=]=%32[^&]&%n", &arg, &value, &pos);

For an input of a=1&b=2, I get arg="a", value="1", c=2, pos=4. Perfect: I can now rerun sscanf on path + pos to get the next argument. Why am I here?

Well, while a=1& behaves identically to the above, a=1 produces arg="a", value="1", c=2, and pos=-1. What do I make of this?

Scrambling for the documentation, I read that

       n      Nothing  is expected; instead, the number of characters consumed
              thus far from the input is  stored  through  the  next  pointer,
              which  must  be  a pointer to int.  This is not a conversion and
              does not increase the count returned by the function.   The  as‐
              signment  can  be  suppressed  with the * assignment-suppression
              character, but the effect on  the  return  value  is  undefined.
              Therefore %*n conversions should not be used.

where more than 50% of the paragraph refers to bookkeeping minutiae. The behavior I am seeing is not discussed.

Wandering around Google search results I quickly reached for Wikipedia's entry for Scanf_format_string (which was the top hit), but, uh...

format specification is empty as of June 2020 Oookay... I feel like I'm in the tumbleweeds here using a feature nobody really looks at. That doesn't inspire my remaining confidence.

Taking a look at what appears to be where %n is implemented in vfscanf-internal.c, I find that 60% of the code (lines) involves discussion regarding standards inconsistencies, 39.6% is implementation minutiae, and 0.4% is actual code (which consists in its entirety of "done++;").

It *appears* that glibc's behavior is to leave the internal value done (which I access using %n) untouched - or rather, undefined - unless some operation alters it. It also appears that using %n in this way was unforeseen and that I'm completely in "here be dragons" territory? :(

I don't think I'm going to be using scanf...

For the sake of completeness, here's something that wraps up what I'm seeing.

#include <stdio.h>

void test(const char *str) {
  int pos = -1;
  char arg[32] = {0}, value[32] = {0};
  int c = sscanf(str, "%32[^=]=%32[^&]&%n", (char *)&arg, (char *)&value, &pos);
  printf("\"%s\": c=%d arg=\"%s\" value=\"%s\" pos=%d\n", str, c, arg, value, pos);
}

int main() {
  test("a=1&b=2"); // "a=1&b=2": c=2 arg="a" value="1" pos=4
  test("a=1&");    // "a=1&": c=2 arg="a" value="1" pos=4
  test("a=1");     // "a=1": c=2 arg="a" value="1" pos=-1
}
like image 360
i336_ Avatar asked Jun 29 '20 02:06

i336_


People also ask

What returns scanf?

Return Value The scanf() function returns the number of fields that were successfully converted and assigned. The return value does not include fields that were read but not assigned. The return value is EOF for an attempt to read at end-of-file if no conversion was performed.

When scanf returns 1?

scanf returns integer values as to the number of valid values read from the standard input console. So if you have a scanf with reading just 1 of the values say integer or character, then it would return 1 if the item is read correctly and stored in the provided variable.

Does %s work in C?

%c and %s are part of the printf() functions in the standard library, not part of the language itself.

Can scanf fail?

You don't check if the scanf actually succeeded, therefore you will get stuck on error. With each loop, the scanf will try to read and fail.


1 Answers

I think the C standard guarantees that the value of pos in your example remains unchanged.

C17 7.21.6.2 says, describing fscanf:

(4) The fscanf function executes each directive of the format in turn. When all directives have been executed, or if a directive fails (as detailed below), the function returns. Failures are described as input failures (due to the occurrence of an encoding error or the unavailability of input characters), or matching failures (due to inappropriate input).

[...]

(6) A directive that is an ordinary multibyte character is executed by reading the next characters of the stream. If any of those characters differ from the ones composing the directive,the directive fails and the differing and subsequent characters remain unread. Similarly, if end-of-file, an encoding error, or a read error prevents a character from being read, the directive fails.

("Multibyte character" here includes ordinary single-byte characters such as your &.)

So in your "a=1" example, the directives %32[^=], =, and %32[^&] all succeed, and now the end of the string has been reached. It's explained in 7.21.6.7 that for sscanf, "reaching the end of the string is equivalent to encountering end-of-file for the fscanf function." Hence no character can be read, so the & directive fails, and sscanf returns without doing anything further. The %n directive never executed, and so nothing happened that would have the right to modify the value of pos. Therefore it must have the same value it had before, namely -1.

I don't think this case was unforeseen; just that it's already covered by existing rules, and so nobody bothered to call it out explicitly.

like image 159
Nate Eldredge Avatar answered Oct 18 '22 09:10

Nate Eldredge