Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python multiline regular expressions [duplicate]

Tags:

python

regex

How do I extract all the characters (including newline characters) until the first occurrence of the giver sequence of words? For example with the following input:

input text:

"shantaram is an amazing novel.
It is one of the best novels i have read.
the novel is written by gregory david roberts.
He is an australian"

And the sequence the I want to extract text from shantaram to first occurrence of the which is in the second line.

The output must be-

shantaram is an amazing novel.
It is one of the

I have been trying all morning. I can write the expression to extract all characters until it encounters a specific character but here if I use an expression like:

re.search("shantaram[\s\S]*the", string)

It doesn't match across newline.

like image 260
AKASH Avatar asked Sep 22 '13 11:09

AKASH


1 Answers

Use this regex,

re.search("shantaram[\s\S]*?the", string)

instead of

re.search("shantaram[\s\S]*the", string)

The only difference is '?'. By using '?'(e.g. *?, +?), you can prevent longest matching.

like image 180
lancif Avatar answered Oct 07 '22 17:10

lancif