Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using lambda functions in beautiful soup

Trying to match links that contain certain texts. I'm doing

links = soup.find_all('a',href=lambda x: ".org" in x)

But that throws a TypeError: argument of type 'NoneType' is not iterable.

The correct way of doing it is apparantly

links = soup.find_all('a',href=lambda x: x and ".org" in x)

Why is the additional x and necessary here?

like image 395
shem Avatar asked Oct 24 '25 13:10

shem


1 Answers

There's a simple reason: One of the <a> tags in your HTML has no href property.


Here's a minimal example that reproduces the exception:

html = '<html><body><a>bar</a></body></html>'
soup = BeautifulSoup(html, 'html.parser')

links = soup.find_all('a', href=lambda x: ".org" in x)
# result:
# TypeError: argument of type 'NoneType' is not iterable

Now if we add a href property, the exception disappears:

html = '<html><body><a href="foo.org">bar</a></body></html>'
soup = BeautifulSoup(html, 'html.parser')

links = soup.find_all('a', href=lambda x: ".org" in x)
# result:
# [<a href="foo.org">bar</a>]

What's happening is that BeautifulSoup is trying to access the <a> tag's href property, and that returns None when the property doesn't exist:

html = '<html><body><a>bar</a></body></html>'
soup = BeautifulSoup(html, 'html.parser')

print(soup.a.get('href'))
# output: None

This is why it's necessary to allow None values in your lambda. Since None is a falsy value, the code x and ... prevents the right side of the and statement from being executed when x is None, as you can see here:

>>> None and 1/0
>>> 'foo.org' and 1/0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero

This is called short-circuiting.


That said, x and ... checks the truthiness of x, and None is not the only value that's considered falsy. So it would be more correct to compare x to None like so:

lambda x: x is not None and ".org" in x
like image 57
Aran-Fey Avatar answered Oct 26 '25 04:10

Aran-Fey