I am trying to replace all text between the div class="one" tag
What I have so far works, but only if everything is on one line.
text_msg is the
text = re.sub('<div class="one">.*?</div>',new_text,text_msg,re.DOTALL)
<div class="one">replace this
more text here
another line
</div>
I have tried re.MULTILINE, got nowhere. What am I doing wrong?
I went and modified your re.sub. The problem with your current code is that you aren't using the flags key word arguments to specify flags. I also changed your regex to look for a precursor pattern (?<=<div class="one">) and post pattern (?=<\/div>).
import re
text_msg = """
<html>
<head>
<title>Terrible webpage</title>
</head>
<body>
<div class="one">Cool text!</div>
<b>test</b>
<div class="one">Second text!</div>
<div class="one">third text!</div>
<div class="one">replace this
more text here
another line
</div>
</body>
</html>
"""
print(re.sub('(?<=<div class="one">).*?(?=<\/div>)',"out",text_msg,flags=re.DOTALL))
Output:
<html>
<head>
<title>Terrible webpage</title>
</head>
<body>
<div class="one">out</div>
<b>test</b>
<div class="one">out</div>
<div class="one">out</div>
<div class="one">out</div>
</body>
</html>
Just replace . with [\s\S] in your regex as shown below:
<div class=\"one\">[\s\S]*?<\/div>
Click for Demo
Explanation:
<div class=\"one\"> - literally matches <div class="one">[\s\S]*? - matches 0+ occurrences of any character(include the newline character), as few as possible<\/div> - literally matches </div>If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With