Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Beautiful Soup Select Vs Find_all data Type

I am new to webscraping, and there seems to be two ways to gather ALL html data I am looking for.

option_1 = soup.find_all('div', class_='p')

option_2 = soup.select('div.p')

I see that option_1 returns class 'bs4.element.ResultSet' and option_2 returns class 'list'

I can still iterate through option_1 with a for loop, so what is the difference between:

  1. select and find_all
  2. 'list' and bs4.element.ResultSet
like image 417
Mwspencer Avatar asked Oct 19 '17 18:10

Mwspencer


People also ask

What is the difference between Find_all () and find () in Beautiful Soup?

find is used for returning the result when the searched element is found on the page. find_all is used for returning all the matches after scanning the entire document.

What is the primary difference between FIND () and Find_all ()?

find() function - return the first element of given tag. find_all() function - return the all the element of given tag.

What is select in Beautiful Soup?

Beautifulsoup is the popular python package that allows you to scrape web content easily. There are many methods for scrapping the content. Beautifulsoup select() method is one of them. The select() method is a CSS selector that allows extracting content inside the defined CSS path in as an argument to the method.

What does Find_all return Beautiful Soup?

Beautiful Soup's find_all(~) method returns a list of all the tags or strings that match a particular criteria.


1 Answers

You should find the answer to your first question here (linked by t-m-adam in the comments).

As for the second question let's take a look at the source code :)

class ResultSet(list):
    """A ResultSet is just a list that keeps track of the SoupStrainer
    that created it."""
    def __init__(self, source, result=()):
        super(ResultSet, self).__init__(result)
        self.source = source

    def __getattr__(self, key):
        raise AttributeError(
            "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
        )

ResultSet is just a subclass of list used to store results of find_all() method.

like image 90
radzak Avatar answered Oct 06 '22 00:10

radzak