Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Beautiful Soup: searching for a nested pattern?

soup.find_all will search a BeautifulSoup document for all occurrences of a single tag. Is there a way to search for particular patterns of nested tags?

For example, I would like to search for all occurrences of this pattern:

<div class="separator">
  <a>
    <img />
  </a>
</div>
like image 942
Mark Harrison Avatar asked Oct 31 '22 23:10

Mark Harrison


2 Answers

There are multiple ways to find the pattern, but the easiest one would be to use a CSS selector:

for img in soup.select('div.separator > a > img'):
    print img  # or img.parent.parent to get the "div"

Demo:

>>> from bs4 import BeautifulSoup
>>> data = """
... <div>
...     <div class="separator">
...       <a>
...         <img src="test1"/>
...       </a>
...     </div>
... 
...     <div class="separator">
...       <a>
...         <img src="test2"/>
...       </a>
...     </div>
... 
...     <div>test3</div>
... 
...     <div>
...         <a>test4</a>
...     </div>
... </div>
... """
>>> soup = BeautifulSoup(data)
>>> 
>>> for img in soup.select('div.separator > a > img'):
...     print img.get('src')
... 
test1
test2

I do understand that, strictly speaking, the solution would not work if the div has more than just one a child, or inside the a tag there is smth else except the img tag. If this is the case the solution can be improved with additional checks (will edit the answer if needed).

like image 101
alecxe Avatar answered Nov 08 '22 10:11

alecxe


Check out this part of the docs. You probably want a function like this:

def nested_img(div):
    child = div.contents[0]
    return child.name == "a" and child.contents[0].name == "img"

soup.find_all("div", nested_img)

P.S.: This is untested.

like image 39
Midnighter Avatar answered Nov 08 '22 09:11

Midnighter