How to get value between two different tags using beautiful soup?

Question

I need to extract data present between a ending tag and a
tag in below code snippet:

<td><b>First Type :</b>W<br><b>Second Type :</b>65<br><b>Third Type :</b>3</td>

What I need is : W, 65, 3

But the problem is that these values can be empty too, like-

<td><b>First Type :</b><br><b>Second Type :</b><br><b>Third Type :</b></td>

I want to get these values if present else an empty string

I tried making use of nextSibling and find_next('br') but it returned

 <br><b>Second Type :</b><br><b>Third Type :</b></br></br>

and

<br><b>Third Type :</b></br>

in case if values(W, 65, 3) are not present between the tags

</b> and <br>

All I need is that it should return a empty string if nothing is present between those tags.

pedropedro · Accepted Answer

I would use a  tag by  tag strategy, looking at what type of info their next_sibling contains.

I would just check whether their next_sibling.string is not None, and accordingly append the list :)

>>> html = """<td><b>First Type :</b><br><b>Second Type :</b>65<br><b>Third Type :</b>3</td>"""

>>> soup = BeautifulSoup(html, "html.parser")
>>> b = soup.find_all("b")
>>> data = []
>>> for tag in b:
        if tag.next_sibling.string == None:
            data.append(" ")
        else:
            data.append(tag.next_sibling.string)
>>> data 
[' ', u'65', u'3'] # Having removed the first string

Hope this helps!

Zroq · Answer

I would search for a td object then use a regex pattern to filter the data that you need, instead of using re.compile in the find_all method.

Like this:

import re
from bs4 import BeautifulSoup

example = """<td><b>First Type :</b>W<br><b>Second Type :</b>65<br><b>Third 
Type :</b>3</td>
<td><b>First Type :</b><br><b>Second Type :</b>69<br><b>Third Type :</b>6</td>"""

soup = BeautifulSoup(example, "html.parser")

for o in soup.find_all('td'):
    match = re.findall(r'</b>\s*(.*?)\s*(<br|</br)', str(o))
    print ("%s,%s,%s" % (match[0][0],match[1][0],match[2][0]))

This pattern finds all text between the  tag and   or  tags. The  tags are added when converting the soup object to string.

This example outputs:

W,65,3

,69,6

Just an example, you can alter to return an empty string if one of the regex matches is empty.

How to get value between two different tags using beautiful soup?

Tags:

python

html-parsing

beautifulsoup

utkarsh awasthi

2 Answers

pedropedro

Zroq

Recent Activity

Donate For Us

How to get value between two different tags using beautiful soup?

Tags:

python

html-parsing

beautifulsoup

utkarsh awasthi

2 Answers

pedropedro

Zroq

Related questions

Recent Activity

Donate For Us