I need to extract data present between a ending tag and a
tag in below code snippet:
<td><b>First Type :</b>W<br><b>Second Type :</b>65<br><b>Third Type :</b>3</td>
What I need is : W, 65, 3
But the problem is that these values can be empty too, like-
<td><b>First Type :</b><br><b>Second Type :</b><br><b>Third Type :</b></td>
I want to get these values if present else an empty string
I tried making use of nextSibling and find_next('br') but it returned
<br><b>Second Type :</b><br><b>Third Type :</b></br></br>
and
<br><b>Third Type :</b></br>
in case if values(W, 65, 3) are not present between the tags
</b> and <br>
All I need is that it should return a empty string if nothing is present between those tags.
I would use a <b>
tag by </b>
tag strategy, looking at what type of info their next_sibling
contains.
I would just check whether their next_sibling.string
is not None
, and accordingly append the list :)
>>> html = """<td><b>First Type :</b><br><b>Second Type :</b>65<br><b>Third Type :</b>3</td>"""
>>> soup = BeautifulSoup(html, "html.parser")
>>> b = soup.find_all("b")
>>> data = []
>>> for tag in b:
if tag.next_sibling.string == None:
data.append(" ")
else:
data.append(tag.next_sibling.string)
>>> data
[' ', u'65', u'3'] # Having removed the first string
Hope this helps!
I would search for a td
object then use a regex
pattern to filter the data that you need, instead of using re.compile
in the find_all
method.
Like this:
import re
from bs4 import BeautifulSoup
example = """<td><b>First Type :</b>W<br><b>Second Type :</b>65<br><b>Third
Type :</b>3</td>
<td><b>First Type :</b><br><b>Second Type :</b>69<br><b>Third Type :</b>6</td>"""
soup = BeautifulSoup(example, "html.parser")
for o in soup.find_all('td'):
match = re.findall(r'</b>\s*(.*?)\s*(<br|</br)', str(o))
print ("%s,%s,%s" % (match[0][0],match[1][0],match[2][0]))
This pattern finds all text between the </b>
tag and <br>
or </br>
tags. The </br>
tags are added when converting the soup object to string.
This example outputs:
W,65,3
,69,6
Just an example, you can alter to return an empty string if one of the regex matches is empty.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With