I'd really like to be able to allow Beautiful Soup to match any list of tags, like so. I know attr accepts regex, but is there anything in beautiful soup that allows you to do so?
soup.findAll("(a|div)")
Output:
<a> ASDFS <div> asdfasdf <a> asdfsdf
My goal is to create a scraper that can grab tables from sites. Sometimes tags are named inconsistently, and I'd like to be able to input a list of tags to name the 'data' part of a table.
Recipe Objective - Working with specific strings using regular expression and beautiful soup? In order to work with strings, we will use the "re" python library which is used for regular expressions. Regular Expression (regex) - A regular expression, the regex method helps to match the specified string in the data.
find() method The find method is used for finding out the first tag with the specified name or id and returning an object of type bs4. Example: For instance, consider this simple HTML webpage having different paragraph tags.
We could just use find_all() again to find all the tr tags, yes, but we can also to iterate over these tags in a more straight forward manner. The children attribute returns an iterable object with all the tags right beneath the parent tag, which is table , therefore it returns all the tr tags.
Note that you can also use regular expressions to search in attributes of tags. For example:
import re from bs4 import BeautifulSoup soup.find_all('a', {'href': re.compile(r'crummy\.com/')})
This example finds all <a>
tags that link to a website containing the substring 'crummy.com'
.
find_all()
is the most favored method in the Beautiful Soup search API.
You can pass a variation of filters. Also, pass a list to find multiple tags:
>>> soup.find_all(['a', 'div'])
Example:
>>> from bs4 import BeautifulSoup >>> soup = BeautifulSoup('<html><body><div>asdfasdf</div><p><a>foo</a></p></body></html>') >>> soup.find_all(['a', 'div']) [<div>asdfasdf</div>, <a>foo</a>]
Or you can use a regular expression to find tags that contain a
or div
:
>>> import re >>> soup.find_all(re.compile("(a|div)"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With