I have a very large string and I like to find a small string or value inside it (in my example 14). A snippet of it looks like this:
I need to retrieve 14. The catch is that 78 is dynamic and I get it's value from a dict (someDict)
str1='dnas ANYTHING Here <td class="tr js-name"><a href="/myportal/report/78/abc/xyz/14" title="balh">blah</a></td>'
str2="/myportal/report/"+str(someDict["Id"])+"/abc/xyz/"
p = re.compile(r'str2\s*(.*?)\"')
match = p.search(str1)
if match:
print(match.group(1))
else:
print("cant find it")
I know there is something wrong with --> p = re.compile(r'str2\s*(.*?)\"') since I cant just stick in str2, how do I go about using compile please
The string you are parsing looks like HTML, regular expressions is not exactly the best tool for the job. I would a more specialized tool - an HTML parser, like BeautifulSoup:
from urllib.parse import urlparse
from bs4 import BeautifulSoup
data = 'dnas ANYTHING Here <td class="tr js-name"><a href="/myportal/report/78/abc/xyz/14" title="balh">blah</a></td>'
soup = BeautifulSoup(data, "html.parser")
href = soup.select_one("td.tr.js-name > a")["href"]
parsed_url = urlparse(href)
print(parsed_url.path.split("/")[-1])
Prints 14.
Note that here td.tr.js-name > a is a CSS selector that is one of the techniques you could use to locate elements in the HTML:
> denotes a direct parent->child relationshiptd.tr.js-name would match a td element having tr and js-name class valuesIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With