From this URL view-source:https://www.amazon.com/dp/073532753X?smid=A3P5ROKL5A1OLE
I want to get string between var iframeContent =
and obj.onloadCallback = onloadCallback;
I have this regex iframeContent(.*?)obj.onloadCallback = onloadCallback;
But it does not work. I am not good at regex so please pardon my lack of knowledge.
I even tried iframeContent(.*?)obj.onloadCallback
but it does not work.
It looks like you just want that giant encoded string. I believe yours is failing for two reasons. You're not running in DOTALL
mode, which means your .
won't match across multiple lines, and your regex is failing because of catastrophic backtracking, which can happen when you have a very long variable length match that matches the same characters as the ones following it.
This should get what you want
m = re.search(r'var iframeContent = \"([^"]+)\"', html_source)
print m.group(1)
The regex is just looking for any characters except double quotes [^"]
in between two double quotes. Because the variable length match and the match immediately after it don't match any of the same characters, you don't run into the catastrophic backtracking issue.
I suspect that input string lies across multiple lines.Try adding re.M
in search line (ie. re.findall('someString', text_Holder, re.M)
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With