Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python regex to match multi-line preprocessor macro

Tags:

python

regex

What follows is a regular expression I have written to match multi-line pre-processor macros in C / C++ code. I'm by no means a regular expressions guru, so I'd welcome any advice on how I can make this better.

Here's the regex:

\s*#define(.*\\\n)+[\S]+(?!\\)

It should match all of this:

#define foo(x) if(x) \
doSomething(x)

But only some of this (shouldn't match the next line of code:

#define foo(x) if(x) \
doSomething(x)
normalCode();

And also shouldn't match single-line preprocessor macros.

I'm pretty sure that the regex above works - but as I said, there probably a better way of doing it, and I imagine that there are ways of breaking it. Can anyone suggest any?

like image 673
Thomi Avatar asked Sep 13 '08 16:09

Thomi


People also ask

What is multiline flag regex?

The " m " flag indicates that a multiline input string should be treated as multiple lines. For example, if " m " is used, " ^ " and " $ " change from matching at only the start or end of the entire string to the start or end of any line within the string.

What is multiline in regex?

Multiline option, or the m inline option, enables the regular expression engine to handle an input string that consists of multiple lines. It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.

Which flag will search over multiple lines in Python?

The re. MULTILINE flag tells python to make the '^' and '$' special characters match the start or end of any line within a string. Using this flag: >>> match = re.search(r'^It has.

What is Match Group () in Python?

re.MatchObject.group() method returns the complete matched subgroup by default or a tuple of matched subgroups depending on the number of arguments.


2 Answers

This is a simple test program I knocked up:

#!/usr/bin/env python

TEST1="""
#include "Foo.h"
#define bar foo\\
    x
#include "Bar.h"
"""

TEST2="""
#define bar foo
#define x 1 \\
    12 \\
    2 \\\\ 3
Foobar
"""

TEST3="""
#define foo(x) if(x) \\
doSomething(x)
"""

TEST4="""
#define foo(x) if(x) \\
doSomething(x)
normalCode();
"""

import re
matcher = re.compile(r"^[ \t]*#define(.*\\\n)+.*$",re.MULTILINE)

def extractDefines(s):
    mo = matcher.search(s)
    if not mo:
        print mo
        return
    print mo.group(0)

extractDefines(TEST1)
extractDefines(TEST2)
extractDefines(TEST3)
extractDefines(TEST4)

The re I used:

r"^[ \t]*#define(.*\\\n)+.*$"

Is very similar to the one use used, the changes:

  1. [ \t] To avoid newlines at the start of the define.
  2. I rely on + being greedy, so I can use a simple .*$ at the end to get the first line of the define that doesn't end with \
like image 116
Douglas Leeder Avatar answered Oct 03 '22 17:10

Douglas Leeder


start        = r"^\s*#define\s+"
continuation = r"(?:.*\\\n)+"
lastline     = r".*$"

re_multiline_macros = re.compile(start + continuation + lastline, 
                                 re.MULTILINE)
like image 40
jfs Avatar answered Oct 03 '22 17:10

jfs