Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Regular Expression replace between div

I am trying to replace all text between the div class="one" tag What I have so far works, but only if everything is on one line.
text_msg is the

text = re.sub('<div class="one">.*?</div>',new_text,text_msg,re.DOTALL)

<div class="one">replace this 
more text here
another line
</div>

I have tried re.MULTILINE, got nowhere. What am I doing wrong?

like image 671
user3525290 Avatar asked May 15 '26 05:05

user3525290


2 Answers

I went and modified your re.sub. The problem with your current code is that you aren't using the flags key word arguments to specify flags. I also changed your regex to look for a precursor pattern (?<=<div class="one">) and post pattern (?=<\/div>).

import re

text_msg = """
<html>
<head>
<title>Terrible webpage</title>
</head>
<body>

<div class="one">Cool text!</div>
<b>test</b>
<div class="one">Second text!</div>
<div class="one">third text!</div>
<div class="one">replace this 
more text here
another line
</div>

</body>
</html>
"""

print(re.sub('(?<=<div class="one">).*?(?=<\/div>)',"out",text_msg,flags=re.DOTALL))

Output:

<html>
<head>
<title>Terrible webpage</title>
</head>
<body>

<div class="one">out</div>
<b>test</b>
<div class="one">out</div>
<div class="one">out</div>
<div class="one">out</div>

</body>
</html>
like image 130
Neil Avatar answered May 17 '26 17:05

Neil


Just replace . with [\s\S] in your regex as shown below:

<div class=\"one\">[\s\S]*?<\/div>

Click for Demo

Explanation:

  • <div class=\"one\"> - literally matches <div class="one">
  • [\s\S]*? - matches 0+ occurrences of any character(include the newline character), as few as possible
  • <\/div> - literally matches </div>
like image 28
Gurmanjot Singh Avatar answered May 17 '26 19:05

Gurmanjot Singh