Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scanf format specifier to read zero or more characters from a set of characters

I need to be very strict in regard to the characters that can be in a read string.

I have a series of whitespace followed by a character followed by a series of whitespace.
Examples: " c ", "c" , "", " "

I need to find a format specifier that allows me to ignore the character but only if it is this particular character and not any other character. This sequence " e " should be aborted.

I tried " %*[c] " but my unittests fail for some scenarios - leading me to believe that " %*[c] " is looking for one or more 'c' instead of zero or more 'c'.

I wrote a mini example to help illustrate my problem better. Keep in mind that this is only a minimum example. The central issue is how do i parse an ammount of zero or one of a single character.

#include <stdio.h>
#include <string.h>

unsigned match(const char * formula){
    unsigned e = 0, found = 0, s;
    char del;
    int parsed, pos, len = (int) strlen(formula); 
    const size_t soc = sizeof( char );
    del = ' ';
    parsed = sscanf_s( formula, " \" %*[(] X%*[^>]>> %u %*[)] %c %n", &s, &del, soc, &pos );// (X >> s )
    if( ( 2 == parsed ) && ( pos == len) && ( '"' == del ) ){
        printf("%6s:%s\n", "OK", formula);
    }else{
        printf("%6s:%s\n", "FAIL", formula);
        e += 1;
    }
    return e;
}

unsigned main( void )
{
    unsigned e = 0;

    printf("SHOULD BE OK\n");
    e += match("     \"X >> 3\""); //This one does not feature the optional characters
    e += match("     \"( X >> 3 ) \"");
    e += match("     \"( X >> 3 ) \"\r");

    printf("SHOULD FAIL\n");
    if ( 0 == match("     \"( Y >> 3 ) \"") ) e += 1;
    if ( 0 == match("     \"g X >> 3 ) \"") ) e += 1;
    if ( 0 == match("     \"( X >> 3.3-4.2 ) \"") ) e += 1;

    if( 0 != e ){ printf( "ERRORS: %2u\n", e ); }
    else{ printf( "all pass\n", e ); }
    return e;
}
like image 209
Johannes Avatar asked Oct 04 '22 01:10

Johannes


1 Answers

As others have indicated to you, using sscanf for this purpose is not advised. The case that it cannot catch is the "optional" ( that may or may not appear between the " and the X. With scanf, if there is an optional field that does not have any kind of delimiter to indicate it is missing, then the only way to determine it is missing is to try to parse it, notice it is not there, and try to parse it again with a different scan format string.

parsed = sscanf( formula, " \" %*[(] X%*[^>]>> %u %*[)] %c %n", &s, &del, &pos );
if (parsed != 2) {
    parsed = sscanf( formula, " \" X%*[^>]>> %u %c %n", &s, &del, &pos );
}

The remainder of this solution describes how to use the POSIX <regex.h> basic regular expressions to parse it.

First, you need to define your regular expression and compile it.

const char *re =
    "[ \t]*\""                 /* match up to '"' */
    "[ \t]*(\\{0,1\\}[ \t]*"   /* match '(' if present */
    "X[ \t]*>>[ \t]*"          /* match 'X >>' */
    "\\([0-9][0-9]*\\)"        /* match number as subexpression */
    "[ \t]*)\\{0,1\\}[ \t]*"   /* match ')' if present */
    "\\(.\\)"                  /* match final delimiter as subexpression */
    "[ \t\r\n]*";              /* match trailing whitespace */
regex_t reg;
int r = regcomp(&reg, re, 0);
if (r != 0) {
    char buf[256];
    regerror(r, &reg, buf, sizeof(buf));
    fprintf(stderr, "regcomp: %s\n", buf);
    /*...*/
}

Now, you will need to execute the expression against the string you want to match against. The compiler will track the number of subexpressions in your regular expression, and put that number in reg.re_nsub. However, there is an implicit subexpression that is not included in that count. That is the complete string that matches the supplied expression. This always shows up in the first match. So, when you create your matching array, account for that. That is why the matches array has one more than what is in reg.re_nsub.

unsigned match(const regex_t *preg, const char * formula){
    /*...*/
    int r;
    const int NSUB = preg->re_nsub + 1;
    regmatch_t matches[NSUB];

    r = regexec(preg, formula, NSUB, matches, 0);
    if (r == 0) {
        /* success */
        parsed = preg->re_nsub;
        s = atoi(formula + matches[1].rm_so);
        del = formula[matches[2].rm_so];
        pos = matches[0].rm_eo;
    } else {
        parsed = 0;
    }
    /*...*/

When you are done with the regular expression, you should free it (if it was successfully compiled).

regfree(&reg);
like image 92
jxh Avatar answered Oct 13 '22 11:10

jxh