So Im practicing my scraping and I came across something like this:
<div class="profileDetail">
<div class="profileLabel">Mobile : </div>
021 427 399
</div>
and I need the number outside of the <div>
tag:
My code is:
num = soup.find("div",{"class":"profileLabel"}).text
but the output of that is Mobile :
only it's the text inside the <div>
tag not the text outside of it.
so how do we extract the text outside of the <div>
tag?
I would make a reusable function to get the value by label, finding the label by text
and getting the next sibling:
import re
def find_by_label(soup, label):
return soup.find("div", text=re.compile(label)).next_sibling
Usage:
find_by_label(soup, "Mobile").strip() # prints "021 427 399"
try using soup.find("div",{"class":"profileLabel"}).next_sibling
, this will grab the next element, which can be either a bs4.Tag
or a bs4.NavigableString
.
bs4.NavigableString
is what your trying to get in this case.
elem = soup.find("div",{"class":"profileLabel"}).next_sibling
print type(elem)
# Should return
bs4.element.NavigableString
In [4]: s = bs4.BeautifulSoup('<div> Hello </div>HiThere<p>next_items</p>', 'html5lib')
In [5]: s
Out[5]: <html><head></head><body><div> Hello </div>HiThere<p>next_items</p></body></html>
In [6]: s.div
Out[6]: <div> Hello </div>
In [7]: s.div.next_sibling
Out[7]: u'HiThere'
In [8]: type(s.div.next_sibling)
Out[8]: bs4.element.NavigableString
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With