Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ignoring newline character in regex match

Tags:

python

regex

I am trying to replace all matching occurrences with title cases using the following script. When there is a newline character between filter words (in this case 'ABC' and 'DEF') that line doesn't get replaced as intended.

How can I ignore the newline character in this case?

Edit: I don't want to strip all newline characters entirely from the string, but only strip those between the filter words.

Edit2: I edited the text and script to better reflect on the issue I am experiencing. If I include flags=re.DOTALL argument, it will give me:

  mmm    = "Hello Hello Hello Hello Hello Hello
              Hello Hello Hello Hello",
  Bbb   = "Bbb",

whereas the output I want is (notice that bbb is not capitalized):

  mmm    = "Hello Hello Hello Hello Hello Hello
              Hello Hello Hello Hello",
  bbb   = "bbb",

The following is the script I am using.

test_string = '''
  mmm    = "hello hello hello hello hello hello
              hello hello hello hello",
  bbb   = "bbb",
'''

rex = r'(?<= mmm)(.*)(?=\")'

def maketitle(match_obj):
    return match_obj.group(0).title()

formatted = re.sub(rex, maketitle, test_string, flags=re.DOTALL)

print(formatted)
like image 481
Layray Avatar asked Sep 06 '18 06:09

Layray


People also ask

How do you escape a new line in regex?

if you don't want to match a real linebreak but a string (with two characters) like '\n' then you just have to escape the backslash with another one \\n so that it will not be recognized as linebreak.

Does regex dot match newline?

The dot matches a single character, without caring what that character is. The only exception are line break characters. In all regex flavors discussed in this tutorial, the dot does not match line breaks by default.

How do you match everything including newline regex?

If you want . to match really everything, including newlines, you need to enable “dot-matches-all” mode in your regex engine of choice (for example, add re. DOTALL flag in Python, or /s in PCRE.


2 Answers

Use the re.DOTALL flag:

formatted = re.sub(rex, maketitle, string, flags=re.DOTALL)
print(formatted)

According to the docs:

re.DOTALL
Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.

like image 97
Mikhail Burshteyn Avatar answered Oct 01 '22 18:10

Mikhail Burshteyn


The following code gives the result you expect:

test_string = '''
  mmm    = "hello hello hello hello hello hello
              hello hello hello hello",
  bbb   = "bbb",
'''

rex = r'(?<= mmm)\s*=\s*"[^"]*'

def maketitle(match_obj):
    return match_obj.group(0).title()

formatted = re.sub(rex, maketitle, test_string)

print(formatted)

I'm assuming that the value you want to "title-case" is always between double quotes, and that it can not contain a double-quote (escaped in some way). Handling escaping would be possible with a slightly more complex regex, though.

like image 34
Pierre-Antoine Avatar answered Oct 01 '22 17:10

Pierre-Antoine