I have a simple 4x2 html table that contains information about a property.
I'm trying to extract the value 1972
, which is under the column heading of Year Built
. If I find all the tags td
, how do I extract the index of the tag that contains the text Year Built
?
Because once I find that index, I can just add 4
to get to the tag that contains the value 1972
.
Here is the html:
<table>
<tbody>
<tr>
<td>Building</td>
<td>Type</td>
<td>Year Built</td>
<td>Sq. Ft.</td>
</tr>
<tr>
<td>R01</td>
<td>DWELL</td>
<td>1972</td>
<td>1166</td>
</tr>
</tbody>
</table>
For example I know that if my input is index 2
and my output is text of that tag Year Built
, I can just do this:
from bs4 import BeautifulSoup
soup = BeautifulSoup(myhtml)
td_list = soup.find_all('td')
print td_list[2].text
But how do I use input of text Year Built
to get output of index 2
?
If your table has a static scheme, it is better using row and column indexes. Try this:
rows = soup.find("table").find("tbody").find_all("tr")
print rows[1].find_all("td")[2].get_text()
Alternatively if you just want to find index number of the tag containing "Year Built":
from bs4 import BeautifulSoup
soup = BeautifulSoup(myhtml)
td_list = soup.find_all('td')
i = 0
for elem in td_list:
if elem.text == 'Year Built':
ind = i
i += 1
print td_list[ind].text
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With