From this URL view-source:https://www.amazon.com/dp/073532753X?smid=A3P5ROKL5A1OLE
I want to get string between var iframeContent = and obj.onloadCallback = onloadCallback;
I have this regex iframeContent(.*?)obj.onloadCallback = onloadCallback;
But it does not work. I am not good at regex so please pardon my lack of knowledge.
I even tried iframeContent(.*?)obj.onloadCallback but it does not work.
It looks like you just want that giant encoded string. I believe yours is failing for two reasons. You're not running in DOTALL mode, which means your . won't match across multiple lines, and your regex is failing because of catastrophic backtracking, which can happen when you have a very long variable length match that matches the same characters as the ones following it.
This should get what you want
m = re.search(r'var iframeContent = \"([^"]+)\"', html_source)
print m.group(1)
The regex is just looking for any characters except double quotes [^"] in between two double quotes. Because the variable length match and the match immediately after it don't match any of the same characters, you don't run into the catastrophic backtracking issue.
I suspect that input string lies across multiple lines.Try adding re.M in search line (ie. re.findall('someString', text_Holder, re.M)).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With