I am new to webscraping, and there seems to be two ways to gather ALL html data I am looking for.
option_1 = soup.find_all('div', class_='p')
option_2 = soup.select('div.p')
I see that option_1 returns class 'bs4.element.ResultSet' and option_2 returns class 'list'
I can still iterate through option_1 with a for loop, so what is the difference between:
find is used for returning the result when the searched element is found on the page. find_all is used for returning all the matches after scanning the entire document.
find() function - return the first element of given tag. find_all() function - return the all the element of given tag.
Beautifulsoup is the popular python package that allows you to scrape web content easily. There are many methods for scrapping the content. Beautifulsoup select() method is one of them. The select() method is a CSS selector that allows extracting content inside the defined CSS path in as an argument to the method.
Beautiful Soup's find_all(~) method returns a list of all the tags or strings that match a particular criteria.
You should find the answer to your first question here (linked by t-m-adam in the comments).
As for the second question let's take a look at the source code :)
class ResultSet(list):
"""A ResultSet is just a list that keeps track of the SoupStrainer
that created it."""
def __init__(self, source, result=()):
super(ResultSet, self).__init__(result)
self.source = source
def __getattr__(self, key):
raise AttributeError(
"ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
)
ResultSet
is just a subclass of list
used to store results of find_all()
method.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With