I'm newbie in parsing tables and regular expressions, can you help to parse this in python:
<table callspacing="0" cellpadding="0">
<tbody><tr>
<td>1text 2text</td>
<td>3text </td>
</tr>
<tr>
<td>4text 5text</td>
<td>6text </td>
</tr>
</tbody></table>
I need the "3text" and "6text"
You can use CSS selector select() and select_one() to get "3text" and "6text" like below:
import requests
from bs4 import BeautifulSoup
html_doc='''
<table callspacing="0" cellpadding="0">
<tbody><tr>
<td>1text 2text</td>
<td>3text </td>
</tr>
<tr>
<td>4text 5text</td>
<td>6text </td>
</tr>
</tbody></table>
'''
soup = BeautifulSoup(html_doc, 'lxml')
soup1 = soup.select('tr')
for i in soup1:
print(i.select_one('td:nth-child(2)').text)
You can also use find_all method:
trs = soup.find('table').find_all('tr')
for i in trs:
tds = i.find_all('td')
print(tds[1].text)
Result:
3text
6text
best way is to use beautifulsoup
from bs4 import BeautifulSoup
html_doc='''
<table callspacing="0" cellpadding="0">
<tbody><tr>
<td>1text 2text</td>
<td>3text </td>
</tr>
<tr>
<td>4text 5text</td>
<td>6text </td>
</tr>
</tbody></table>
'''
soup = BeautifulSoup(html_doc, "html.parser")
# finds all tr tags
for i in soup.find_all("tr"):
# finds all td tags in tr tags
for k in i.find_all("td"):
# prints all td tags with a text format
print(k.text)
in this case it prints
1text 2text
3text
4text 5text
6text
but you can grab the texts you want with indexing. In this case you could just go with
# finds all tr tags
for i in soup.find_all("tr"):
# finds all td tags in tr tags
print(i.find_all("td")[1].text)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With