I'm looking for a way to use findAll to get two tags, in the order they appear on the page.
Currently I have:
import requests import BeautifulSoup def get_soup(url): request = requests.get(url) page = request.text soup = BeautifulSoup(page) get_tags = soup.findAll('hr' and 'strong') for each in get_tags: print each
If I use that on a page with only 'em' or 'strong' in it then it will get me all of those tags, if I use on one with both it will get 'strong' tags.
Is there a way to do this? My main concern is preserving the order in which the tags are found.
In order to use multiple tags or elements, we have to use a list or dictionary inside the find/find_all() function. find/find_all() functions are provided by a beautiful soup library to get the data using specific tags or elements. Beautiful Soup is the python library for scraping data from web pages.
find is used for returning the result when the searched element is found on the page. find_all is used for returning all the matches after scanning the entire document. It is used for getting merely the first tag of the incoming HTML object for which condition is satisfied.
findAll("p", {"class": "pagination-container and something"}) , BeautifulSoup would match an element having the exact class attribute value. There is no splitting involved in this case - it just sees that there is an element where the complete class value equals the desired string.
You could pass a list, to find any of the given tags:
tags = soup.find_all(['hr', 'strong'])
Use regular expressions:
import re get_tags = soup.findAll(re.compile(r'(hr|strong)'))
The expression r'(hr|strong)'
will find either hr
tags or strong
tags.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With