Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python re.sub multiline on string

I try to use the flag re.MULTILINE.

I read these posts : Bug in Python Regex? (re.sub with re.MULTILINE), Python re.sub MULTILINE caret match but it doesn't work. The code :

import re
if __name__ == '__main__':

    txt = "\n\
<?php\n\
/* Multi-line\n\
comment */\n\
$var = 1;\n"
    new_txt = re.sub(r'\/\*[.\n]*?\*\/', '', txt, flags=re.MULTILINE)
    print("\n=========== TXT ============")
    print(txt)
    print("\n=========== NEW TXT ============")
    print(new_txt)

The code output :

=========== TXT ============

<?php
/* Multi-line
comment */
$var = 1;


=========== NEW TXT ============

<?php
/* Multi-line
comment */
$var = 1;

But new_txt should not contains Multi-line comment. I want to get the txt without the Multi-line comment. Do you have any idea ?

like image 735
Samuel Dauzon Avatar asked Nov 24 '15 09:11

Samuel Dauzon


People also ask

What is re multiline in Python?

The re. MULTILINE flag tells python to make the '^' and '$' special characters match the start or end of any line within a string. Using this flag: >>> match = re.search(r'^It has. *', paragraph, re.

What does re multiline do?

The re. MULTILINE search modifier forces the ^ symbol to match at the beginning of each line of text (and not just the first), and the $ symbol to match at the end of each line of text (and not just the last one). The re. MULTILINE search modifier takes no arguments.

How do you're sub in Python?

To replace a string in Python, the regex sub() method is used. It is a built-in Python method in re module that returns replaced string. Don't forget to import the re module. This method searches the pattern in the string and then replace it with a new given expression.

How do you write a multi line string in Python?

In Python, you have different ways to specify a multiline string. You can have a string split across multiple lines by enclosing it in triple quotes. Alternatively, brackets can also be used to spread a string into different lines. Moreover, backslash works as a line continuation character in Python.


1 Answers

You need to replace re.MULTILINE with re.DOTALL/re.S and move out period outside the character class as inside it, the dot matches a literal ..

Note that re.MULTILINE only redefines the behavior of ^ and $ that are forced to match at the start/end of a line rather than the whole string. The re.DOTALL flag redefines the behavior of . inside the pattern outside the character class only. It starts matching a newline symbol, too.

So, the regex you could use for the current example: /\*.*?\*/. It matches a literal /* with /\*, then .*? matches as few any symbols as possible up to and including */ (matched with \*/).

See the code demo:

txt = """\n\
<?php\n\
/* Multi-line\n\
comment */\n\
$var = 1;\n"""
new_txt = re.sub(r'/\*.*?\*/', '', txt, flags=re.S)
print("\n=========== TXT ============")
print(txt)
print("\n=========== NEW TXT ============")
print(new_txt)

See IDEONE demo

However, it is not the best solution, as in most cases multiline comments are very long. The best is an unrolling-the-loop technique. The regex above can be "unrolled" like this:

/\*[^*]*(?:\*(?!/)[^*]*)*\*/

See the regex demo

like image 62
Wiktor Stribiżew Avatar answered Sep 25 '22 13:09

Wiktor Stribiżew