Extracting text outside of a
tag BeautifulSoup

Question

So Im practicing my scraping and I came across something like this:

<div class="profileDetail">
    <div class="profileLabel">Mobile : </div>
     021 427 399 
</div>

and I need the number outside of the <div> tag:

My code is:

num = soup.find("div",{"class":"profileLabel"}).text

but the output of that is Mobile : only it's the text inside the <div> tag not the text outside of it.

so how do we extract the text outside of the <div> tag?

alecxe · Accepted Answer

I would make a reusable function to get the value by label, finding the label by text and getting the next sibling:

import re

def find_by_label(soup, label):
    return soup.find("div", text=re.compile(label)).next_sibling

Usage:

find_by_label(soup, "Mobile").strip()  # prints "021 427 399"

Brandon Nadeau · Answer

try using soup.find("div",{"class":"profileLabel"}).next_sibling, this will grab the next element, which can be either a bs4.Tag or a bs4.NavigableString.

bs4.NavigableString is what your trying to get in this case.

elem = soup.find("div",{"class":"profileLabel"}).next_sibling
print type(elem)

# Should return
bs4.element.NavigableString

Example:

In [4]: s = bs4.BeautifulSoup('<div> Hello </div>HiThere<p>next_items</p>', 'html5lib')

In [5]: s
Out[5]: <html><head></head><body><div> Hello </div>HiThere<p>next_items</p></body></html>

In [6]: s.div
Out[6]: <div> Hello </div>

In [7]: s.div.next_sibling
Out[7]: u'HiThere'

In [8]: type(s.div.next_sibling)
Out[8]: bs4.element.NavigableString

Extracting text outside of a <div> tag BeautifulSoup

Tags:

python

html

html-parsing

beautifulsoup

Zion

2 Answers

alecxe

Example:

Brandon Nadeau

Recent Activity

Donate For Us

Extracting text outside of a <div> tag BeautifulSoup

Tags:

python

html

html-parsing

beautifulsoup

Zion

2 Answers

alecxe

Example:

Brandon Nadeau

Related questions

Recent Activity

Donate For Us