Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

position of tags in a html page using python 2.7 with beautifulsoup

I am trying to parse a html page with the given format:

<img class="outer" id="first" />
<div class="content" .../>
<div class="content" .../>
<div class="content" />
<img class="outer" id="second" />
<div class="content" .../>
<div class="content" .../>
<img class="outer" id="third" />
<div class="content" .../>
<div class="content" .../>

When iterating over the div tags I want to figure out whether the current div tag is under img tag with id 'first', 'second' or 'third'. Is there a way to do that? I have the list of img blocks and div blocks:

img_blocks = soup.find_all('img', attrs={'class':'outer'})
div_Blocks = soup.find_all('div', attrs={'class':'content'})
like image 620
Ranjan Avatar asked Jun 30 '13 07:06

Ranjan


1 Answers

Use .find_previous_sibling:

>>> for divtag in div_Blocks:
...     print divtag.find_previous_sibling('img')
... 
<img class="outer" id="first"/>
<img class="outer" id="first"/>
<img class="outer" id="first"/>
<img class="outer" id="second"/>
<img class="outer" id="second"/>
<img class="outer" id="third"/>
<img class="outer" id="third"/>
like image 130
TerryA Avatar answered Sep 21 '22 03:09

TerryA