Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Properly using sscanf

Tags:

c

input

token

I am supposed to get an input line that can be in of any of the following formats:

  • There must be space between word 1 and word 2.
  • There must be a comma between word 2 and word 3.
  • Spaces are not a must between word 2 and word 3 — but any number of spaces is possible.

How can I separate 1, 2 and 3 word cases and put the data into the correct variables?

word1
word1 word2 
word1 word2 , word3
word1 word2,word3

I thought about something like:

sscanf("string", "%s %s,%s", word1, word2, word3);

but it doesn't seem to work.

I use strict C89.

like image 978
Nahum Avatar asked Nov 28 '22 06:11

Nahum


1 Answers

int n = sscanf("string", "%s %[^, ]%*[, ]%s", word1, word2, word3);

The return value in n tells you how many assignments were made successfully. The %[^, ] is a negated character-class match that finds a word not including either commas or blanks (add tabs if you like). The %*[, ] is a match that finds a comma or space but suppresses the assignment.

I'm not sure I'd use this in practice, but it should work. It is, however, untested.


Maybe a tighter specification is:

int n = sscanf("string", "%s %[^, ]%*[,]%s", word1, word2, word3);

The difference is that the non-assigning character class only accepts a comma. sscanf() stops at any space (or EOS, end of string) after word2, and skips spaces before assigning to word3. The previous edition allowed a space between the second and third words in lieu of a comma, which the question does not strictly allow.

As pmg suggests in a comment, the assigning conversion specifications should be given a length to prevent buffer overflow. Note that the length does not include the null terminator, so the value in the format string must be one less than the size of the arrays in bytes. Also note that whereas printf() allows you to specify sizes dynamically with *, sscanf() et al use * to suppress assignment. That means you have to create the string specifically for the task at hand:

char word1[20], word2[32], word3[64];
int n = sscanf("string", "%19s %31[^, ]%*[,]%63s", word1, word2, word3);

(Kernighan & Pike suggest formatting the format string dynamically in their (excellent) book 'The Practice of Programming' or Amazon The Practice of Programming 1999.)


Just found a problem: given "word1 word2 ,word3", it doesn't read word3. Is there a cure?

Yes, there's a cure, and it is actually trivial, too. Add a space in the format string before the non-assigning, comma-matching conversion specification. Thus:

#include <stdio.h>

static void tester(const char *data)
{
    char word1[20], word2[32], word3[64];
    int n = sscanf(data, "%19s %31[^, ] %*[,]%63s", word1, word2, word3);
    printf("Test data: <<%s>>\n", data);
    printf("n = %d; w1 = <<%s>>, w2 = <<%s>>, w3 = <<%s>>\n", n, word1, word2, word3);
}

int main(void)
{
    const char *data[] =
    {
        "word1 word2 , word3",
        "word1 word2 ,word3",
        "word1 word2, word3",
        "word1 word2,word3",
        "word1 word2       ,       word3",
    };
    enum { DATA_SIZE = sizeof(data)/sizeof(data[0]) };
    size_t i;
    for (i = 0; i < DATA_SIZE; i++)
        tester(data[i]);
    return(0);
}

Example output:

Test data: <<word1 word2 , word3>>
n = 3; w1 = <<word1>>, w2 = <<word2>>, w3 = <<word3>>
Test data: <<word1 word2 ,word3>>
n = 3; w1 = <<word1>>, w2 = <<word2>>, w3 = <<word3>>
Test data: <<word1 word2, word3>>
n = 3; w1 = <<word1>>, w2 = <<word2>>, w3 = <<word3>>
Test data: <<word1 word2,word3>>
n = 3; w1 = <<word1>>, w2 = <<word2>>, w3 = <<word3>>
Test data: <<word1 word2       ,       word3>>
n = 3; w1 = <<word1>>, w2 = <<word2>>, w3 = <<word3>>

Once the 'non-assigning character class' only accepts a comma, you can abbreviate that to a literal comma in the format string:

int n = sscanf(data, "%19s %31[^, ] , %63s", word1, word2, word3);

Plugging that into the test harness produces the same result as before. Note that all code benefits from review; it can often (essentially always) be improved even after it is working.

like image 192
Jonathan Leffler Avatar answered Dec 31 '22 13:12

Jonathan Leffler