soup.find_all
will search a BeautifulSoup document for all occurrences of a single tag. Is there a way to search for particular patterns of nested tags?
For example, I would like to search for all occurrences of this pattern:
<div class="separator">
<a>
<img />
</a>
</div>
There are multiple ways to find the pattern, but the easiest one would be to use a CSS selector
:
for img in soup.select('div.separator > a > img'):
print img # or img.parent.parent to get the "div"
Demo:
>>> from bs4 import BeautifulSoup
>>> data = """
... <div>
... <div class="separator">
... <a>
... <img src="test1"/>
... </a>
... </div>
...
... <div class="separator">
... <a>
... <img src="test2"/>
... </a>
... </div>
...
... <div>test3</div>
...
... <div>
... <a>test4</a>
... </div>
... </div>
... """
>>> soup = BeautifulSoup(data)
>>>
>>> for img in soup.select('div.separator > a > img'):
... print img.get('src')
...
test1
test2
I do understand that, strictly speaking, the solution would not work if the div
has more than just one a
child, or inside the a
tag there is smth else except the img
tag. If this is the case the solution can be improved with additional checks (will edit the answer if needed).
Check out this part of the docs. You probably want a function like this:
def nested_img(div):
child = div.contents[0]
return child.name == "a" and child.contents[0].name == "img"
soup.find_all("div", nested_img)
P.S.: This is untested.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With