Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to match C #include file

Tags:

regex

I need some help trying to match a C include file with full path like so:

#include <stdio.h>  -> stdio.h
#include "monkey/chicken.h" -> monkey/chicken.h

So far I have (adapted from another expression I found):

^\s*\#include\s+(["'<])([^"'<>/\|\b]+)*([">])

But, I'm kind of stuck at this point - it doesn't match in the second case, and I'm not sure how to return the result of the match, eg the file path back to regcomp().

BTW I've looked at regexplib.com, but can't find anything suitable.

Edit: Yes I am a total regexp newbie, using POSIX regex with regmatch_t and friends...

like image 798
Justicle Avatar asked Sep 14 '09 06:09

Justicle


People also ask

Is there regex in C?

There is no built-in support for regex in ANSI C.

How do you match in regex?

The fundamental building blocks of a regex are patterns that match a single character. Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" .

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9.

What is pattern matching in C language?

Pattern matching in C− We have to find if a string is present in another string, as an example, the string "algorithm” is present within the string "naive algorithm". If it is found, then its location (i.e. position it is present at) is displayed.


5 Answers

Here's what I wrote :

#include ((<[^>]+>)|("[^"]+"))

Does it fit ?

like image 89
Clement Herreman Avatar answered Nov 09 '22 12:11

Clement Herreman


This would give better results:

^\s*\#include\s+["<]([^">]+)*[">]

You then want to look at the first capture group when you get a match.

You don't say what language you're using, the factor you mention regcomp() leads me to believe that you're using POSIX regex library in C. If that's right, then you want to use the regexec function and use the nmatch and pmatch parameters to get the first capture group.

like image 38
Laurence Gonsalves Avatar answered Nov 09 '22 10:11

Laurence Gonsalves


You can try this regex:

(^\s*\#\s*include\s*<([^<>]+)>)|(^\s*\#\s*include\s*"([^"]+)")

I prefer to have seperate regex for
#include <>
and
#include ""

like image 24
Nick Dandoulakis Avatar answered Nov 09 '22 10:11

Nick Dandoulakis


IF you want more precise solution that allows also comments before the include file as, for example,

  /* ops, a comment */ /* oh, another comment */   #include  "new_header1.h" /* let's try another with an #include "old_header.h" */

is:

^(?:\s*|\s*\/\*.*?\*\/)\s*#include\s*(?:(?:<)(?<PATH>.*?)(?:>)|(?:")(?<PATH>.*?)(?:"))
like image 25
Drake Avatar answered Nov 09 '22 10:11

Drake


Not particularly well tested, but it matches your two cases:

^\s*#include\s+(<([^"'<>|\b]+)>|"([^"'<>|\b]+)")

The only problem is that due to the < and > thing, the result could be in capture group 2 or 3, so you should check if 2 is empty, then use 3... The advantage over some of the other answers is that it won't match sth like this: #include "bad.h> or this: #include <bad<<h>

And here's an example how to use (wrap) regcomp & friends:

 static bool regexMatch(const std::string& sRegEx, const std::string& sSubject, std::vector<std::string> *vCaptureGroups)
 {
  regex_t re;
  int flags = REG_EXTENDED | REG_ICASE;
  int status;

  if(!vCaptureGroups) flags |= REG_NOSUB;

  if(regcomp(&re, sRegEx.c_str(), flags) != 0)
  {
   return false;
  }

  if(vCaptureGroups)
  {
   int mlen = re.re_nsub + 1;
   regmatch_t *rawMatches = new regmatch_t[mlen];

   status = regexec(&re, sSubject.c_str(), mlen, rawMatches, 0);

   vCaptureGroups->clear();
   vCaptureGroups->reserve(mlen);

   if(status == 0)
   {
    for(size_t i = 0; i < mlen; i++)
    {
     vCaptureGroups->push_back(sSubject.substr(rawMatches[i].rm_so, rawMatches[i].rm_eo - rawMatches[i].rm_so - 1));
    }
   }

   delete[] rawMatches;
  }
  else
  {
   status = regexec(&re, sSubject.c_str(), 0, NULL, 0);
  }

  regfree(&re);

  return (status == 0);
 }
like image 25
KiNgMaR Avatar answered Nov 09 '22 10:11

KiNgMaR