Beautiful Soup Nested Tag Search

Tags:

I am trying to write a python program that will count the words on a web page. I use Beautiful Soup 4 to scrape the page but I have difficulties accessing nested HTML tags (for example: <p class="hello"> inside <div>).

Every time I try finding such tag using page.findAll() (page is Beautiful Soup object containing the whole page) method it simply doesn't find any, although there are. Is there any simple method or another way to do it?

253

asked Oct 01 '17 09:10

Asafwr

2 Answers

Maybe I'm guessing what you are trying to do is first looking in a specific div tag and the search all p tags in it and count them or do whatever you want. For example:

soup = bs4.BeautifulSoup(content, 'html.parser') 

# This will get the div
div_container = soup.find('div', class_='some_class')  

# Then search in that div_container for all p tags with class "hello"
for ptag in div_container.find_all('p', class_='hello'):
    # prints the p tag content
    print(ptag.text)

Hope that helps

177

answered Oct 14 '22 15:10

Mario Kirov

Try this one :

data = []
for nested_soup in soup.find_all('xyz'):
    data = data + nested_soup.find_all('abc')
# data holds all shit together

Maybe you can turn in into lambda and make it cool, but this works. Thanks.

answered Oct 14 '22 17:10

Maifee Ul Asad

Related questions
                            
                                Plot colored polygons with geodataframe in folium
                            
                                Python: sqlite no matching distribution found for sqlite
                            
                                Bad Request (400) using Django, Heroku, and Name.com
                            
                                Python can't open symlinked file
                            
                                How to set the R_HOME environment variable to the R home directory?
                            
                                Concise way to filter data in xarray
                            
                                pandas: get the value of the index for a row?
                            
                                Python hash() function on strings
                            
                                Name of a Python function in a stack trace
                            
                                How to create an async generator in Python?
                            
                                How to apply pos_tag_sents() to pandas dataframe efficiently
                            
                                How to access Slack's Interactive Message request payload parameter?
                            
                                Difference between Linear Regression Coefficients between Python and R
                            
                                How to access "__" (double underscore) variables in methods added to a class
                            
                                How can I create a language independent library using Python?
                            
                                SQLAlchemy - Multiple Foreign key pointing to same table same attribute
                            
                                How to standardize data with sklearn's cross_val_score()
                            
                                What are the arguments for scipy.stats.uniform?
                            
                                pyodbc.connect() works, but not sqlalchemy.create_engine().connect()
                            
                                ALLOWED_HOSTS and Django

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Beautiful Soup Nested Tag Search

Tags:

python

html

beautifulsoup

Asafwr

People also ask

2 Answers

Mario Kirov

Maifee Ul Asad

Recent Activity

Donate For Us