Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detecting mismatches against constants in scanf format string

Tags:

c

io

scanf

From the man page of scanf:

A directive is one of the following:

  • A sequence of white-space characters (space, tab, newline, etc.; see isspace(3)). This directive matches any amount of white space, including none, in the input.

  • An ordinary character (i.e., one other than white space or '%'). This character must exactly match the next character of input. (emphasis mine)

  • A conversion specification, which commences with a '%' (percent) character. A sequence of characters from the input is converted according to this specification, and the result is placed in the corresponding pointer argument. If the next item of input does not match the conversion specification, the conversion fails—this is a matching failure.

Now, consider the following code:

#include <stdio.h>

int main(void)
{
    const char* fmt = "A %49s B";
    char buf[50];

    printf("%d\n", sscanf("A foo B", fmt, buf));            // 1
    printf("%d\n", sscanf("blah blaaah blah", fmt, buf));   // 0
    printf("%d\n", sscanf("A blah blah", fmt, buf));        // 1

    return 0;
}

Lines 1 and 3 print 1 because matching "A" with "A" succeeds, as does matching "foo"/"blah" with %s. Line 2 prints 0 because "A" cannot be matched with "blah", so parsing stops there.

This is all fine and logical, but is there any way for me to detect that a matching failure occurred after all conversion specifications have been successfully matched and assigned? In that case, the value returned by scanf will be the number of conversion specifiers in my format string, so I can't use it to tell if matching succeeded till the very end.

In other words: the string fed to sscanf in line 3 is not "valid" in the sense that it's not in the format A [something] B. Can I use scanf to detect this, or is strtok my only option?

like image 887
user4520 Avatar asked Mar 12 '23 04:03

user4520


1 Answers

Employ a " %n" at the end of the format.

Directives:
" " scans 0 or more white-space. It does not fail.
"%n" saves the count of the number of characters parsed so far (as an int). It does not fail.

Set n to 0 and test to see that it changed. The change would only happen if the entire preceding format succeeded. Also test that the scan ended on a null character - thus detecting trail unwanted text.

The added " ", though optional, if very useful as typically a trailing white-space, which is often a '\n', is not offensive. It negates the needed for a scanned line of text to be preprocessed to have its line ending removed.

#include <stdio.h>

void test(const char *s) {
  const char* fmt = "A %49s B %n";
  char buf[50];
  int n = 0;
  int cnt = sscanf(s, fmt, buf, &n);
  int success = n > 0 && s[n] == '\0';
  printf("sscanf():%2d  n:%2d  success:%d  '%s'\n", cnt, n, success, s);
}

int main(void) {
  test("A foo B");
  test("blah blaaah blah");
  test("A blah blah");
  test("A foo B ");
  test("A foo B x");
  test("");
  return 0;
}

Output

sscanf(): 1  n: 7  success:1  'A foo B'
sscanf(): 0  n: 0  success:0  'blah blaaah blah'
sscanf(): 1  n: 0  success:0  'A blah blah'
sscanf(): 1  n: 8  success:1  'A foo B '
sscanf(): 1  n: 8  success:0  'A foo B x'
sscanf():-1  n: 0  success:0  ''

Note that success is determined by n alone. On lack of success, the destination scanned variables like buf should not be used. If a partial result is needed, then use the return value of sscanf().

like image 117
chux - Reinstate Monica Avatar answered Apr 06 '23 22:04

chux - Reinstate Monica