I try to use the flag re.MULTILINE.
I read these posts : Bug in Python Regex? (re.sub with re.MULTILINE), Python re.sub MULTILINE caret match but it doesn't work. The code :
import re
if __name__ == '__main__':
txt = "\n\
<?php\n\
/* Multi-line\n\
comment */\n\
$var = 1;\n"
new_txt = re.sub(r'\/\*[.\n]*?\*\/', '', txt, flags=re.MULTILINE)
print("\n=========== TXT ============")
print(txt)
print("\n=========== NEW TXT ============")
print(new_txt)
The code output :
=========== TXT ============
<?php
/* Multi-line
comment */
$var = 1;
=========== NEW TXT ============
<?php
/* Multi-line
comment */
$var = 1;
But new_txt should not contains Multi-line comment. I want to get the txt without the Multi-line comment. Do you have any idea ?
The re. MULTILINE flag tells python to make the '^' and '$' special characters match the start or end of any line within a string. Using this flag: >>> match = re.search(r'^It has. *', paragraph, re.
The re. MULTILINE search modifier forces the ^ symbol to match at the beginning of each line of text (and not just the first), and the $ symbol to match at the end of each line of text (and not just the last one). The re. MULTILINE search modifier takes no arguments.
To replace a string in Python, the regex sub() method is used. It is a built-in Python method in re module that returns replaced string. Don't forget to import the re module. This method searches the pattern in the string and then replace it with a new given expression.
In Python, you have different ways to specify a multiline string. You can have a string split across multiple lines by enclosing it in triple quotes. Alternatively, brackets can also be used to spread a string into different lines. Moreover, backslash works as a line continuation character in Python.
You need to replace re.MULTILINE
with re.DOTALL
/re.S
and move out period outside the character class as inside it, the dot matches a literal .
.
Note that re.MULTILINE
only redefines the behavior of ^
and $
that are forced to match at the start/end of a line rather than the whole string. The re.DOTALL
flag redefines the behavior of .
inside the pattern outside the character class only. It starts matching a newline symbol, too.
So, the regex you could use for the current example: /\*.*?\*/
. It matches a literal /*
with /\*
, then .*?
matches as few any symbols as possible up to and including */
(matched with \*/
).
See the code demo:
txt = """\n\
<?php\n\
/* Multi-line\n\
comment */\n\
$var = 1;\n"""
new_txt = re.sub(r'/\*.*?\*/', '', txt, flags=re.S)
print("\n=========== TXT ============")
print(txt)
print("\n=========== NEW TXT ============")
print(new_txt)
See IDEONE demo
However, it is not the best solution, as in most cases multiline comments are very long. The best is an unrolling-the-loop technique. The regex above can be "unrolled" like this:
/\*[^*]*(?:\*(?!/)[^*]*)*\*/
See the regex demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With