I am trying to replace all matching occurrences with title cases using the following script. When there is a newline character between filter words (in this case 'ABC' and 'DEF') that line doesn't get replaced as intended.
How can I ignore the newline character in this case?
Edit: I don't want to strip all newline characters entirely from the string, but only strip those between the filter words.
Edit2: I edited the text and script to better reflect on the issue I am experiencing. If I include flags=re.DOTALL
argument, it will give me:
mmm = "Hello Hello Hello Hello Hello Hello
Hello Hello Hello Hello",
Bbb = "Bbb",
whereas the output I want is (notice that bbb
is not capitalized):
mmm = "Hello Hello Hello Hello Hello Hello
Hello Hello Hello Hello",
bbb = "bbb",
The following is the script I am using.
test_string = '''
mmm = "hello hello hello hello hello hello
hello hello hello hello",
bbb = "bbb",
'''
rex = r'(?<= mmm)(.*)(?=\")'
def maketitle(match_obj):
return match_obj.group(0).title()
formatted = re.sub(rex, maketitle, test_string, flags=re.DOTALL)
print(formatted)
if you don't want to match a real linebreak but a string (with two characters) like '\n' then you just have to escape the backslash with another one \\n so that it will not be recognized as linebreak.
The dot matches a single character, without caring what that character is. The only exception are line break characters. In all regex flavors discussed in this tutorial, the dot does not match line breaks by default.
If you want . to match really everything, including newlines, you need to enable “dot-matches-all” mode in your regex engine of choice (for example, add re. DOTALL flag in Python, or /s in PCRE.
Use the re.DOTALL
flag:
formatted = re.sub(rex, maketitle, string, flags=re.DOTALL)
print(formatted)
According to the docs:
re.DOTALL
Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.
The following code gives the result you expect:
test_string = '''
mmm = "hello hello hello hello hello hello
hello hello hello hello",
bbb = "bbb",
'''
rex = r'(?<= mmm)\s*=\s*"[^"]*'
def maketitle(match_obj):
return match_obj.group(0).title()
formatted = re.sub(rex, maketitle, test_string)
print(formatted)
I'm assuming that the value you want to "title-case" is always between double quotes, and that it can not contain a double-quote (escaped in some way). Handling escaping would be possible with a slightly more complex regex, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With