Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Beautiful Soup Nested Tag Search

I am trying to write a python program that will count the words on a web page. I use Beautiful Soup 4 to scrape the page but I have difficulties accessing nested HTML tags (for example: <p class="hello"> inside <div>).

Every time I try finding such tag using page.findAll() (page is Beautiful Soup object containing the whole page) method it simply doesn't find any, although there are. Is there any simple method or another way to do it?

like image 253
Asafwr Avatar asked Oct 01 '17 09:10

Asafwr


People also ask

How do you scrape nested tags with BeautifulSoup?

Step-by-step ApproachStep 1: The first step will be for scraping we need to import beautifulsoup module and get the request of the website we need to import the requests module. Step 2: The second step will be to request the URL call get method.

What is the difference between Find_all () and find () in BeautifulSoup?

find is used for returning the result when the searched element is found on the page. find_all is used for returning all the matches after scanning the entire document.

What is prettify in BeautifulSoup?

The prettify() method will turn a Beautiful Soup parse tree into a nicely formatted Unicode string, with a separate line for each tag and each string: Python3.


2 Answers

Maybe I'm guessing what you are trying to do is first looking in a specific div tag and the search all p tags in it and count them or do whatever you want. For example:

soup = bs4.BeautifulSoup(content, 'html.parser') 

# This will get the div
div_container = soup.find('div', class_='some_class')  

# Then search in that div_container for all p tags with class "hello"
for ptag in div_container.find_all('p', class_='hello'):
    # prints the p tag content
    print(ptag.text)

Hope that helps

like image 177
Mario Kirov Avatar answered Oct 14 '22 15:10

Mario Kirov


Try this one :

data = []
for nested_soup in soup.find_all('xyz'):
    data = data + nested_soup.find_all('abc')
# data holds all shit together

Maybe you can turn in into lambda and make it cool, but this works. Thanks.

like image 45
Maifee Ul Asad Avatar answered Oct 14 '22 17:10

Maifee Ul Asad