The HTLM page I'm trying to read has 21 tables. The specific table I'm trying to reference is unique in that is has a unique <caption>
and not all tables even have a caption.
Here is a snippet of the structure:
<table class="wikitable">
<caption>Very long caption</caption>
<tbody>
<tr align="center" bgcolor="#efefef">
I've tried:
soup = BeautifulSoup(r.text, "html.parser")
table1 = soup.find('table', caption="Very long caption")
But returns a None
object.
To extract a table from HTML, you first need to open your developer tools to see how the HTML looks and verify if it really is a table and not some other element. You open developer tools with the F12 key, see the “Elements” tab, and highlight the element you're interested in.
soup.find('table', caption="Very long caption")
This basically means - locate a table
element that has a caption
attribute having Very long caption
value. This obviously returns nothing.
What I would do is to locate the caption
element by text and get the parent table
element:
soup.find("caption", text="Very long caption").find_parent("table")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With