I need some help trying to match a C include file with full path like so:
#include <stdio.h> -> stdio.h
#include "monkey/chicken.h" -> monkey/chicken.h
So far I have (adapted from another expression I found):
^\s*\#include\s+(["'<])([^"'<>/\|\b]+)*([">])
But, I'm kind of stuck at this point - it doesn't match in the second case, and I'm not sure how to return the result of the match, eg the file path back to regcomp().
BTW I've looked at regexplib.com, but can't find anything suitable.
Edit: Yes I am a total regexp newbie, using POSIX regex with regmatch_t and friends...
There is no built-in support for regex in ANSI C.
The fundamental building blocks of a regex are patterns that match a single character. Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" .
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9.
Pattern matching in C− We have to find if a string is present in another string, as an example, the string "algorithm” is present within the string "naive algorithm". If it is found, then its location (i.e. position it is present at) is displayed.
Here's what I wrote :
#include ((<[^>]+>)|("[^"]+"))
Does it fit ?
This would give better results:
^\s*\#include\s+["<]([^">]+)*[">]
You then want to look at the first capture group when you get a match.
You don't say what language you're using, the factor you mention regcomp() leads me to believe that you're using POSIX regex library in C. If that's right, then you want to use the regexec function and use the nmatch and pmatch parameters to get the first capture group.
You can try this regex:
(^\s*\#\s*include\s*<([^<>]+)>)|(^\s*\#\s*include\s*"([^"]+)")
I prefer to have seperate regex for#include <>
and#include ""
IF you want more precise solution that allows also comments before the include file as, for example,
/* ops, a comment */ /* oh, another comment */ #include "new_header1.h" /* let's try another with an #include "old_header.h" */
is:
^(?:\s*|\s*\/\*.*?\*\/)\s*#include\s*(?:(?:<)(?<PATH>.*?)(?:>)|(?:")(?<PATH>.*?)(?:"))
Not particularly well tested, but it matches your two cases:
^\s*#include\s+(<([^"'<>|\b]+)>|"([^"'<>|\b]+)")
The only problem is that due to the < and > thing, the result could be in capture group 2 or 3, so you should check if 2 is empty, then use 3... The advantage over some of the other answers is that it won't match sth like this: #include "bad.h> or this: #include <bad<<h>
And here's an example how to use (wrap) regcomp & friends:
static bool regexMatch(const std::string& sRegEx, const std::string& sSubject, std::vector<std::string> *vCaptureGroups)
{
regex_t re;
int flags = REG_EXTENDED | REG_ICASE;
int status;
if(!vCaptureGroups) flags |= REG_NOSUB;
if(regcomp(&re, sRegEx.c_str(), flags) != 0)
{
return false;
}
if(vCaptureGroups)
{
int mlen = re.re_nsub + 1;
regmatch_t *rawMatches = new regmatch_t[mlen];
status = regexec(&re, sSubject.c_str(), mlen, rawMatches, 0);
vCaptureGroups->clear();
vCaptureGroups->reserve(mlen);
if(status == 0)
{
for(size_t i = 0; i < mlen; i++)
{
vCaptureGroups->push_back(sSubject.substr(rawMatches[i].rm_so, rawMatches[i].rm_eo - rawMatches[i].rm_so - 1));
}
}
delete[] rawMatches;
}
else
{
status = regexec(&re, sSubject.c_str(), 0, NULL, 0);
}
regfree(&re);
return (status == 0);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With