I was wondering what is the difference between performing bs.find('div')
and bs.select_one('div')
. Same goes for find_all
and select
.
Is there any difference performance wise, or if any is better to use over the other in specific cases.
select_one is normally much faster than find:
In [13]: req = requests.get("https://httpbin.org/")
In [14]: soup = BeautifulSoup(req.content, "html.parser")
In [15]: soup.select_one("#DESCRIPTION")
Out[15]: <h2 id="DESCRIPTION">DESCRIPTION</h2>
In [16]: soup.find("h2", id="DESCRIPTION")
Out[16]: <h2 id="DESCRIPTION">DESCRIPTION</h2>
In [17]: timeit soup.find("h2", id="DESCRIPTION")
100 loops, best of 3: 5.27 ms per loop
In [18]: timeit soup.select_one("#DESCRIPTION")
1000 loops, best of 3: 649 µs per loop
In [19]: timeit soup.select_one("div")
10000 loops, best of 3: 61 µs per loop
In [20]: timeit soup.find("div")
1000 loops, best of 3: 446 µs per loop
find basically is just the same as using find_all setting the limit to 1, then checking if the list returned is empty or not, indexing, if it is not empty or returning None if it is.
def find(self, name=None, attrs={}, recursive=True, text=None,
**kwargs):
"""Return only the first child of this Tag matching the given
criteria."""
r = None
l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
if l:
r = l[0]
return r
select_one does something similar using select:
def select_one(self, selector):
"""Perform a CSS selection operation on the current element."""
value = self.select(selector, limit=1)
if value:
return value[0]
return None
The cost is much lower with the select without all the keyword args to process.
Beautifulsoup : Is there a difference between .find() and .select() - python 3.xx covers a bit more on the differences.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With