Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

beautifulsoup: find the n-th element's sibling

I have a complex html DOM tree of the following nature:

<table>
    ...
    <tr>
        <td>
            ...
        </td>
        <td>
            <table>
                <tr>
                    <td>
                        <!-- inner most table -->
                        <table>
                            ...
                        </table>

                        <h2>This is hell!</h2>
                    <td>
                </tr>
            </table>
        </td>
    </tr>
</table>

I have some logic to find out the inner most table. But after having found it, I need to get the next sibling element (h2). Is there anyway you can do this?

like image 543
deostroll Avatar asked Apr 10 '10 13:04

deostroll


1 Answers

If tag is the innermost table, then

tag.findNextSibling('h2')

will be

<h2>This is hell!</h2>

To literally get the next sibling, you could use tag.nextSibling, which in this case, is u'\n'.

If you want the next sibling that is not a NavigableString (such as u'\n'), then you could use

tag.findNextSibling(text=None)

If you want the second sibling (no matter what it is), you could use

tag.nextSibling.nextSibling

(but note that if tag does not have a next sibling, then tag.nextSibling will be None, and tag.nextSibling.nextSibling will raise an AttributeError.)

like image 171
unutbu Avatar answered Oct 05 '22 06:10

unutbu