Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python BeautifulSoup give multiple tags to findAll

I'm looking for a way to use findAll to get two tags, in the order they appear on the page.

Currently I have:

import requests import BeautifulSoup  def get_soup(url):     request = requests.get(url)     page = request.text     soup = BeautifulSoup(page)     get_tags = soup.findAll('hr' and 'strong')     for each in get_tags:         print each 

If I use that on a page with only 'em' or 'strong' in it then it will get me all of those tags, if I use on one with both it will get 'strong' tags.

Is there a way to do this? My main concern is preserving the order in which the tags are found.

like image 734
DasSnipez Avatar asked Dec 18 '13 02:12

DasSnipez


People also ask

How do you find multiple tags in BeautifulSoup?

In order to use multiple tags or elements, we have to use a list or dictionary inside the find/find_all() function. find/find_all() functions are provided by a beautiful soup library to get the data using specific tags or elements. Beautiful Soup is the python library for scraping data from web pages.

What is the difference between findAll and Find_all?

find is used for returning the result when the searched element is found on the page. find_all is used for returning all the matches after scanning the entire document. It is used for getting merely the first tag of the incoming HTML object for which condition is satisfied.

How does findAll work BeautifulSoup?

findAll("p", {"class": "pagination-container and something"}) , BeautifulSoup would match an element having the exact class attribute value. There is no splitting involved in this case - it just sees that there is an element where the complete class value equals the desired string.


2 Answers

You could pass a list, to find any of the given tags:

tags = soup.find_all(['hr', 'strong']) 
like image 164
jfs Avatar answered Sep 21 '22 18:09

jfs


Use regular expressions:

import re get_tags = soup.findAll(re.compile(r'(hr|strong)')) 

The expression r'(hr|strong)' will find either hr tags or strong tags.

like image 20
TerryA Avatar answered Sep 21 '22 18:09

TerryA