Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python using Regex find a string with dynamic value in a big string

Tags:

python

regex

I have a very large string and I like to find a small string or value inside it (in my example 14). A snippet of it looks like this:

I need to retrieve 14. The catch is that 78 is dynamic and I get it's value from a dict (someDict)

str1='dnas  ANYTHING Here <td class="tr js-name"><a href="/myportal/report/78/abc/xyz/14" title="balh">blah</a></td>'

str2="/myportal/report/"+str(someDict["Id"])+"/abc/xyz/"

p = re.compile(r'str2\s*(.*?)\"')
match = p.search(str1)
if match:
    print(match.group(1))
else:
    print("cant find it")

I know there is something wrong with --> p = re.compile(r'str2\s*(.*?)\"') since I cant just stick in str2, how do I go about using compile please

like image 719
Ghost Avatar asked May 20 '26 07:05

Ghost


1 Answers

The string you are parsing looks like HTML, regular expressions is not exactly the best tool for the job. I would a more specialized tool - an HTML parser, like BeautifulSoup:

from urllib.parse import urlparse

from bs4 import BeautifulSoup


data = 'dnas  ANYTHING Here <td class="tr js-name"><a href="/myportal/report/78/abc/xyz/14" title="balh">blah</a></td>'

soup = BeautifulSoup(data, "html.parser")
href = soup.select_one("td.tr.js-name > a")["href"]

parsed_url = urlparse(href)
print(parsed_url.path.split("/")[-1])

Prints 14.

Note that here td.tr.js-name > a is a CSS selector that is one of the techniques you could use to locate elements in the HTML:

  • > denotes a direct parent->child relationship
  • td.tr.js-name would match a td element having tr and js-name class values
like image 51
alecxe Avatar answered May 22 '26 17:05

alecxe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!