In BeautifulSoup, if I want to find all div's where whose class is span3, I'd just do:
result = soup.findAll("div",{"class":"span3"})
However, in my case, I want to find all div's whose class starts with span3, therefore, BeautifulSoup should find:
<div id="span3 span49"> <div id="span3 span39">
And so on...
How do I achieve what I want? I am familiar with regular expressions; however I do not know how to implement them to beautiful soup nor did I find any help by going through BeautifulSoup's documentation.
find is used for returning the result when the searched element is found on the page. find_all is used for returning all the matches after scanning the entire document.
There are many Beautifulsoup methods, which allows us to search a parse tree. The two most common and used methods are find() and find_all(). Before talking about find() and find_all(), let us see some examples of different filters you can pass into these methods.
Well, these are id
attributes you are showing:
<div id="span3 span49"> <div id="span3 span39">
In this case, you can use:
soup.find_all("div", id=lambda value: value and value.startswith("span3"))
Or:
soup.find_all("div", id=re.compile("^span3"))
If this was just a typo, and you actually have class
attributes start with span3
, and your really need to check the class to start with span3
, you can use the "starts-with" CSS selector:
soup.select("div[class^=span3]")
This is because you cannot check the class
attribute the same way you checked the id
attribute because class
is special, it is a multi-valued attribute.
This works too:
soup.select("div[class*=span3]") # with *= means: contains
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With