Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3 regular expression to find multiline comment

I'm trying to find comment blocks in PHP source code using regular expressions in Python 3. The PHP comments are in this format:

/**
 * This is a very short block comment
 */

Now I came up with the following regular expression:

'/\*\*[.]+?\*/'

I figure that -in combination with the DOTALL flag- should do it, but no. It doesn't find anything. Strange thing is that when I remove the trailing slash, like this:

'/\*\*[.]+?\*'

then it finds the following string:

/**\n\t*

I have no idea why the regex can't find an asterisk followed by a slash... I checked the file that I'm searching to double check I didn't have a typo in the comment (I didn't). Also a slash is no special character in regex, so I wouldn't have to escape it. (I tried, but it didn't help.)

Can anyone tell me what's wrong with my regex? :)

By the way, I also came across this! thread where someone tried to do the same in Java. The final winning answer finished his regular expression the same way I do now, so I'm clueless :( Could this be a bug in Python regex or am I completely missing something?

Any help is much appreciated! :D

like image 515
lunanoko Avatar asked Apr 01 '26 23:04

lunanoko


1 Answers

You can use the re.DOTALL flag to make the . character match newlines:

re.compile(r'/\*\*.+?\*/', re.DOTALL)

(As a side note, PHP block comments can start with /*, not just /**.)

like image 98
jtbandes Avatar answered Apr 04 '26 12:04

jtbandes



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!