I am trying to identify tags in an html document based on part of the attribute value.
For example, if I have a Beautifulsoup object:
import bs4 as BeautifulSoup
r = requests.get("http:/My_Page")
soup = BeautifulSoup(r.text, "html.parser")
I want tr
tags with id
attribute whose values are formatted like this: "news_4343_23255_xxx". I'm interested in any tr
tag as long as it has "news" as the first 4 characters of the id
attribute value.
I know I can search as follows:
trs = soup.find_all("tr",attrs={"id":True})
which gives me all tr
tages with an id
attribute.
How do I seach based on a substring?
Use regex to get tr
with id
starting with "news"
Ex:
from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(html, "html.parser")
for i in soup.find_all("tr", {'id': re.compile(r'^news')}):
print(i)
Try this:
trs = soup.find_all("tr", id=lambda x: x and x.startswith('news_')
referenced here: Matching id's in BeautifulSoup
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With