Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bs4 select_one vs find

I was wondering what is the difference between performing bs.find('div') and bs.select_one('div'). Same goes for find_all and select.

Is there any difference performance wise, or if any is better to use over the other in specific cases.

like image 539
Salma Hamed Avatar asked Aug 19 '16 07:08

Salma Hamed


1 Answers

select_one is normally much faster than find:

In [13]: req = requests.get("https://httpbin.org/")

In [14]: soup = BeautifulSoup(req.content, "html.parser")

In [15]:  soup.select_one("#DESCRIPTION")
Out[15]: <h2 id="DESCRIPTION">DESCRIPTION</h2>

In [16]:  soup.find("h2", id="DESCRIPTION")
Out[16]: <h2 id="DESCRIPTION">DESCRIPTION</h2>

In [17]: timeit  soup.find("h2", id="DESCRIPTION")
100 loops, best of 3: 5.27 ms per loop

In [18]: timeit  soup.select_one("#DESCRIPTION")
1000 loops, best of 3: 649 µs per loop

In [19]: timeit  soup.select_one("div")
10000 loops, best of 3: 61 µs per loop
In [20]: timeit  soup.find("div")
1000 loops, best of 3: 446 µs per loop

find basically is just the same as using find_all setting the limit to 1, then checking if the list returned is empty or not, indexing, if it is not empty or returning None if it is.

def find(self, name=None, attrs={}, recursive=True, text=None,
         **kwargs):
    """Return only the first child of this Tag matching the given
    criteria."""
    r = None
    l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
    if l:
        r = l[0]
    return r

select_one does something similar using select:

def select_one(self, selector):
        """Perform a CSS selection operation on the current element."""
        value = self.select(selector, limit=1)
        if value:
            return value[0]
        return None

The cost is much lower with the select without all the keyword args to process.

Beautifulsoup : Is there a difference between .find() and .select() - python 3.xx covers a bit more on the differences.

like image 190
Padraic Cunningham Avatar answered Oct 27 '22 20:10

Padraic Cunningham