I am inexperienced with using C, and I need to use PCRE to get matches.
Here is a sample of my source code:
int test2()
{
const char *error;
int erroffset;
pcre *re;
int rc;
int i;
int ovector[OVECCOUNT];
char *regex = "From:([^@]+)@([^\r]+)";
char str[] = "From:[email protected]\r\n"\
"From:[email protected]\r\n"\
"From:[email protected]\r\n";
re = pcre_compile (
regex, /* the pattern */
0, /* default options */
&error, /* for error message */
&erroffset, /* for error offset */
0); /* use default character tables */
if (!re) {
printf("pcre_compile failed (offset: %d), %s\n", erroffset, error);
return -1;
}
rc = pcre_exec (
re, /* the compiled pattern */
0, /* no extra data - pattern was not studied */
str, /* the string to match */
strlen(str), /* the length of the string */
0, /* start at offset 0 in the subject */
0, /* default options */
ovector, /* output vector for substring information */
OVECCOUNT); /* number of elements in the output vector */
if (rc < 0) {
switch (rc) {
case PCRE_ERROR_NOMATCH:
printf("String didn't match");
break;
default:
printf("Error while matching: %d\n", rc);
break;
}
free(re);
return -1;
}
for (i = 0; i < rc; i++) {
printf("%2d: %.*s\n", i, ovector[2*i+1] - ovector[2*i], str + ovector[2*i]);
}
}
In this demo, the output is only:
0: From:[email protected]
1: regular.expressions
2: example.com
I want to output all of the matches; how can I do that?
The dot matches all except newlines (\r\n). So use \s\S, which will match ALL characters.
Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".
PCRE (Perl Compatible Regular Expressions) is a C library implementing regex. It was written in 1997 when Perl was the de-facto choice for complex text processing tasks. The syntax for patterns used in PCRE closely resembles Perl. PCRE syntax is being used in many big projects including PHP, Apache, R to name a few.
Regular expressions allow us to not just match text but also to extract information for further processing. This is done by defining groups of characters and capturing them using the special parentheses ( and ) metacharacters. Any subpattern inside a pair of parentheses will be captured as a group.
I use a class to wrap PCRE to make this easier, but after the pcre_exec, the ovector contains the substring indexes you need to find the matches within the original string.
So it would be something like:
#include <string>
#include <iostream>
#include "pcre.h"
int main (int argc, char *argv[])
{
const char *error;
int erroffset;
pcre *re;
int rc;
int i;
int ovector[100];
char *regex = "From:([^@]+)@([^\r]+)";
char str[] = "From:[email protected]\r\n"\
"From:[email protected]\r\n"\
"From:[email protected]\r\n";
re = pcre_compile (regex, /* the pattern */
PCRE_MULTILINE,
&error, /* for error message */
&erroffset, /* for error offset */
0); /* use default character tables */
if (!re)
{
printf("pcre_compile failed (offset: %d), %s\n", erroffset, error);
return -1;
}
unsigned int offset = 0;
unsigned int len = strlen(str);
while (offset < len && (rc = pcre_exec(re, 0, str, len, offset, 0, ovector, sizeof(ovector))) >= 0)
{
for(int i = 0; i < rc; ++i)
{
printf("%2d: %.*s\n", i, ovector[2*i+1] - ovector[2*i], str + ovector[2*i]);
}
offset = ovector[1];
}
return 1;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With