Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python regular expression matching anything

Tags:

python

regex

My regular expression isnt doing anything to my string.

python

data = 'random\n<article stuff\n</article>random stuff'
datareg = re.sub(r'.*<article(.*)</article>.*', r'<article\1</article>', data, flags=re.MULTILINE)
print datareg

i get

random
<article stuff
</article>random stuff

i want

<article stuff
</article>
like image 439
user1442957 Avatar asked Sep 12 '12 22:09

user1442957


1 Answers

re.MULTILINE doesn't actually make your regex multiline in the way you want it to be.

When specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character '$' matches at the end of the string and at the end of each line (immediately preceding each newline). By default, '^' matches only at the beginning of the string, and '$' only at the end of the string and immediately before the newline (if any) at the end of the string.

re.DOTALL does:

Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.

Change flags=re.MULTILINE to flags=re.DOTALL and your regex will work.

like image 164
Blender Avatar answered Sep 27 '22 00:09

Blender