I do not understand why the regex pattern containing the \d character class does not work but [0-9] does. Character classes, such as \s (whitespace characters) and \w (word characters), do work. My compiler is gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3. I am using the C regular expression library.
Why doesn't \d work?
Text string:
const char *text = "148  apples    5 oranges";
For the above text string, this regex does not match:
const char *rstr = "^\\d+\\s+\\w+\\s+\\d+\\s+\\w+$";
This regex matches when using [0-9] instead of \d:
const char *rstr = "^[0-9]+\\s+\\w+\\s+[0-9]+\\s+\\w+$";
#include <stdio.h>
#include <stdlib.h>
#include <regex.h>
#define N_MATCHES  30
//   output from gcc --version: gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
//   compile command used:  gcc -o tstc_regex tstc_regex.c
const char *text = "148  apples    5 oranges";
  const char *rstr = "^[0-9]+\\s+\\w+\\s+[0-9]+\\s+\\w+$";    // finds match
//const char *rstr = "^\\d+\\s+\\w+\\s+\\d+\\s+\\w+$";        // does not find match
int main(int argc, char**argv)
{
    regex_t   rgx;
    regmatch_t   matches[N_MATCHES];
    int status;
    status = regcomp(&rgx, rstr, REG_EXTENDED | REG_NEWLINE);
    if (status != 0) {
        fprintf(stdout, "regcomp error: %d\n", status);
        return 1;
    }
    status = regexec(&rgx, text, N_MATCHES, matches, 0);
    if (status == REG_NOMATCH) {
        fprintf(stdout, "regexec result: REG_NOMATCH (%d)\n", status);
    }
    else if (status != 0) {
        fprintf(stdout, "regexec error: %d\n", status);
        return 1;
    }
    else {
        fprintf(stdout, "regexec match found: %d\n", status);
    }
    return 0;
}
The regex flavor you're using is GNU ERE, which is similar to POSIX ERE, but with a few extra features.  Among these are support for the character class shorthands \s, \S, \w and \W, but not \d and \D.  You can find more info here.
Trying either pattern in a strictly POSIX environment will likely end up having no matches; if you want to make the pattern truly POSIX compatible use all bracket expressions:
const char *rstr = "^[[:digit:]]+[[:space:]]+[[:alpha:]]+[[:space:]]+[[:digit:]]+[[:space:]]+[[:alpha:]]+$";
↳ POSIX Character_classes
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With