Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression: how to match a string containing "\n" (newline)?

I'm trying to dump data from a SQL export file with regular expression. To match the field of post content, I use '(?P<content>.*?)'. It works fine most of the time, but if the field contains the string of '\n' the regular expression wouldn't match. How can I modify the regular expression to match them? Thanks!

Example(I'm using Python):

>>> re.findall("'(?P<content>.*?)'","'<p>something, something else</p>'")
['<p>something, something else</p>']

>>> re.findall("'(?P<content>.*?)'","'<p>something, \n something else</p>'")
[]

P.S. Seemingly all strings with '\' in the front are treated as escape characters. How can I tell regx to treat them as they are?

like image 360
Xun Yang Avatar asked Nov 16 '11 11:11

Xun Yang


1 Answers

You should use DOTALL option:

>>> re.findall("'(?P<content>.*?)'","'<p>something, \n something else</p>'", re.DOTALL)
['<p>something, \n something else</p>']

See this.

like image 194
Adam Zalcman Avatar answered Sep 20 '22 12:09

Adam Zalcman