Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to use BeautifulSoup to find a specific table in an HTML doc

The HTLM page I'm trying to read has 21 tables. The specific table I'm trying to reference is unique in that is has a unique <caption> and not all tables even have a caption.

Here is a snippet of the structure:

<table class="wikitable">
    <caption>Very long caption</caption>
    <tbody>
        <tr align="center" bgcolor="#efefef">

I've tried:

soup = BeautifulSoup(r.text, "html.parser")
table1 = soup.find('table', caption="Very long caption")

But returns a None object.

like image 329
Jeff Barrette Avatar asked Dec 30 '15 01:12

Jeff Barrette


People also ask

How do I extract a table in HTML?

To extract a table from HTML, you first need to open your developer tools to see how the HTML looks and verify if it really is a table and not some other element. You open developer tools with the F12 key, see the “Elements” tab, and highlight the element you're interested in.


1 Answers

soup.find('table', caption="Very long caption")

This basically means - locate a table element that has a caption attribute having Very long caption value. This obviously returns nothing.

What I would do is to locate the caption element by text and get the parent table element:

soup.find("caption", text="Very long caption").find_parent("table")
like image 136
alecxe Avatar answered Sep 19 '22 01:09

alecxe