Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex not working to get string between 2 strings. Python 27 [duplicate]

Tags:

python

regex

From this URL view-source:https://www.amazon.com/dp/073532753X?smid=A3P5ROKL5A1OLE I want to get string between var iframeContent = and obj.onloadCallback = onloadCallback;

I have this regex iframeContent(.*?)obj.onloadCallback = onloadCallback;

But it does not work. I am not good at regex so please pardon my lack of knowledge.

I even tried iframeContent(.*?)obj.onloadCallback but it does not work.

like image 235
Umair Ayub Avatar asked Feb 05 '23 21:02

Umair Ayub


2 Answers

It looks like you just want that giant encoded string. I believe yours is failing for two reasons. You're not running in DOTALL mode, which means your . won't match across multiple lines, and your regex is failing because of catastrophic backtracking, which can happen when you have a very long variable length match that matches the same characters as the ones following it.

This should get what you want

m = re.search(r'var iframeContent = \"([^"]+)\"', html_source)
print m.group(1)

The regex is just looking for any characters except double quotes [^"] in between two double quotes. Because the variable length match and the match immediately after it don't match any of the same characters, you don't run into the catastrophic backtracking issue.

like image 161
Brendan Abel Avatar answered Feb 08 '23 11:02

Brendan Abel


I suspect that input string lies across multiple lines.Try adding re.M in search line (ie. re.findall('someString', text_Holder, re.M)).

like image 21
Fejs Avatar answered Feb 08 '23 12:02

Fejs