Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

beautifulsoup: find_all on bs4.element.ResultSet object or list?

Hi so I apply find_all on a beautifulsoup object, and find something, which is an bs4.element.ResultSet object or a list.

I want to further do find_all in there, but it's not allowed on a bs4.element.ResultSet object. I can loop through each element of the bs4.element.ResultSet object to do find_all. But can I avoid looping and just convert it back to a beautifulsoup object?

See code for details please. Thanks

html_1 = """
<table>
    <thead>
        <tr class="myClass">
            <th>A</th>
            <th>B</th>
            <th>C</th>
            <th>D</th>
        </tr>
    </thead>
</table>
"""
soup = BeautifulSoup(html_1, 'html.parser')

type(soup) #bs4.BeautifulSoup

# do find_all on beautifulsoup object
th_all = soup.find_all('th')

# the result is of type bs4.element.ResultSet or similarly list
type(th_all) #bs4.element.ResultSet
type(th_all[0:1]) #list

# now I want to further do find_all
th_all.find_all(text='A') #not work

# can I avoid this need of loop?
for th in th_all:
    th.find_all(text='A') #works
like image 441
YJZ Avatar asked Mar 18 '16 04:03

YJZ


People also ask

What does Find_all return Beautiful Soup?

Beautiful Soup's find_all(~) method returns a list of all the tags or strings that match a particular criteria.

How do I find a specific element with Beautiful Soup?

To find elements that contain a specific text in Beautiful Soup, we can use find_all(~) method together with a lambda function.

What is Find () method in Beautiful Soup?

find() method The find method is used for finding out the first tag with the specified name or id and returning an object of type bs4. Example: For instance, consider this simple HTML webpage having different paragraph tags.

What does Beautiful Soup Select Return?

BeautifulSoup has a . select() method which uses the SoupSieve package to run a CSS selector against a parsed document and return all the matching elements.


1 Answers

ResultSet class is a subclass of a list and not a Tag class which has the find* methods defined. Looping through the results of find_all() is the most common approach:

th_all = soup.find_all('th')
result = []
for th in th_all:
    result.extend(th.find_all(text='A'))

Usually, CSS selectors may help you solve it in one go except that not everything you can do with find_all() is possible with the select() method. For instance, there is no "text" search available in bs4 CSS selectors. But, if, for example, you had to find all, say, b elements inside th elements, you could do:

soup.select("th td")
like image 127
alecxe Avatar answered Oct 22 '22 06:10

alecxe