Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Beautiful Soup Find Tags based on partial attribute value

I am trying to identify tags in an html document based on part of the attribute value.

For example, if I have a Beautifulsoup object:

import bs4 as BeautifulSoup

r = requests.get("http:/My_Page")

soup = BeautifulSoup(r.text, "html.parser")

I want tr tags with id attribute whose values are formatted like this: "news_4343_23255_xxx". I'm interested in any tr tag as long as it has "news" as the first 4 characters of the id attribute value.

I know I can search as follows:

trs = soup.find_all("tr",attrs={"id":True})

which gives me all tr tages with an id attribute.

How do I seach based on a substring?

like image 517
Windstorm1981 Avatar asked May 31 '18 17:05

Windstorm1981


2 Answers

Use regex to get tr with id starting with "news"

Ex:

from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(html,  "html.parser")
for i in soup.find_all("tr", {'id': re.compile(r'^news')}):
    print(i)
like image 121
Rakesh Avatar answered Oct 11 '22 15:10

Rakesh


Try this:

trs = soup.find_all("tr", id=lambda x: x and x.startswith('news_')

referenced here: Matching id's in BeautifulSoup

like image 3
return Avatar answered Oct 11 '22 13:10

return