Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use PCRE to get all match groups?

I am inexperienced with using C, and I need to use PCRE to get matches.
Here is a sample of my source code:

int test2()
{
    const char *error;
    int   erroffset;
    pcre *re;
    int   rc;
    int   i;
    int   ovector[OVECCOUNT];

    char *regex = "From:([^@]+)@([^\r]+)";
    char str[]  = "From:[email protected]\r\n"\
                  "From:[email protected]\r\n"\
                  "From:[email protected]\r\n";

    re = pcre_compile (
             regex,       /* the pattern */
             0,                    /* default options */
             &error,               /* for error message */
             &erroffset,           /* for error offset */
             0);                   /* use default character tables */

    if (!re) {
        printf("pcre_compile failed (offset: %d), %s\n", erroffset, error);
        return -1;
    }

    rc = pcre_exec (
        re,                   /* the compiled pattern */
        0,                    /* no extra data - pattern was not studied */
        str,                  /* the string to match */
        strlen(str),          /* the length of the string */
        0,                    /* start at offset 0 in the subject */
        0,                    /* default options */
        ovector,              /* output vector for substring information */
        OVECCOUNT);           /* number of elements in the output vector */

    if (rc < 0) {
        switch (rc) {
            case PCRE_ERROR_NOMATCH:
                printf("String didn't match");
                break;

            default:
                printf("Error while matching: %d\n", rc);
                break;
        }
        free(re);
        return -1;
    }

    for (i = 0; i < rc; i++) {
        printf("%2d: %.*s\n", i, ovector[2*i+1] - ovector[2*i], str + ovector[2*i]);
    }
}

In this demo, the output is only:

0: From:[email protected]
1: regular.expressions
2: example.com

I want to output all of the matches; how can I do that?

like image 351
tbmvp Avatar asked Sep 14 '09 14:09

tbmvp


People also ask

How do you match everything including newline regex?

The dot matches all except newlines (\r\n). So use \s\S, which will match ALL characters.

How do I capture a group in regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".

What is PCRE pattern?

PCRE (Perl Compatible Regular Expressions) is a C library implementing regex. It was written in 1997 when Perl was the de-facto choice for complex text processing tasks. The syntax for patterns used in PCRE closely resembles Perl. PCRE syntax is being used in many big projects including PHP, Apache, R to name a few.

What is regex match group?

Regular expressions allow us to not just match text but also to extract information for further processing. This is done by defining groups of characters and capturing them using the special parentheses ( and ) metacharacters. Any subpattern inside a pair of parentheses will be captured as a group.


1 Answers

I use a class to wrap PCRE to make this easier, but after the pcre_exec, the ovector contains the substring indexes you need to find the matches within the original string.

So it would be something like:

#include <string>
#include <iostream>
#include "pcre.h"

int main (int argc, char *argv[])
{
    const char *error;
    int   erroffset;
    pcre *re;
    int   rc;
    int   i;
    int   ovector[100];

    char *regex = "From:([^@]+)@([^\r]+)";
    char str[]  = "From:[email protected]\r\n"\
                  "From:[email protected]\r\n"\
                  "From:[email protected]\r\n";

    re = pcre_compile (regex,          /* the pattern */
                       PCRE_MULTILINE,
                       &error,         /* for error message */
                       &erroffset,     /* for error offset */
                       0);             /* use default character tables */
    if (!re)
    {
        printf("pcre_compile failed (offset: %d), %s\n", erroffset, error);
        return -1;
    }

    unsigned int offset = 0;
    unsigned int len    = strlen(str);
    while (offset < len && (rc = pcre_exec(re, 0, str, len, offset, 0, ovector, sizeof(ovector))) >= 0)
    {
        for(int i = 0; i < rc; ++i)
        {
            printf("%2d: %.*s\n", i, ovector[2*i+1] - ovector[2*i], str + ovector[2*i]);
        }
        offset = ovector[1];
    }
    return 1;
}
like image 56
RC. Avatar answered Oct 03 '22 20:10

RC.