I'd like to do something like this:
soup.find_all('td', attrs!={"class":"foo"})
I want to find all td that do not have the class of foo.
Obviously the above doesn't work, what does?
find is used for returning the result when the searched element is found on the page. find_all is used for returning all the matches after scanning the entire document.
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
BeautifulSoup has a . select() method which uses the SoupSieve package to run a CSS selector against a parsed document and return all the matching elements.
The prettify() method will turn a Beautiful Soup parse tree into a nicely formatted Unicode string, with a separate line for each tag and each string: Python3.
BeautifulSoup
really makes the "soup" beautiful and easy to work with.
You can pass a function in the attribute value:
soup.find_all('td', class_=lambda x: x != 'foo')
Demo:
>>> from bs4 import BeautifulSoup
>>> data = """
... <tr>
... <td>1</td>
... <td class="foo">2</td>
... <td class="bar">3</td>
... </tr>
... """
>>> soup = BeautifulSoup(data)
>>> for element in soup.find_all('td', class_=lambda x: x != 'foo'):
... print element.text
...
1
3
There is a method .select()
which allows you to pass CSS selectors as a string:
soup.select('td:not(.foo)')
The above code will return all <td>
tags which are not of the class foo
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With