Is it possible to split a text from a tag by br tags?
I have this tag contents: [u'+420 777 593 531', <br/>, u'+420 776 593 531', <br/>, u'+420 775 593 531']
And I want to get only numbers. Any advices?
EDIT:
[x for x in dt.find_next_sibling('dd').contents if x!=' <br/>']
Does not work at all.
Step 1: The first step will be for scraping we need to import beautifulsoup module and get the request of the website we need to import the requests module. Step 2: The second step will be to request the URL call get method.
BeautifulSoup is a Python package that parses broken HTML, just like lxml supports it based on the parser of libxml2.
The navigablestring object is used to represent the contents of a tag. To access the contents, use “. string” with tag. You can replace the string with another string but you can't edit the existing string.
Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.
You need to test for tags, which are modelled as Element
instances. Element
objects have a name
attribute, while text elements don't (which are NavigableText
instances):
[x for x in dt.find_next_sibling('dd').contents if getattr(x, 'name', None) != 'br']
Since you appear to only have text and <br />
elements in that <dd>
element, you may as well just get all the contained strings instead:
list(dt.find_next_sibling('dd').stripped_strings)
Demo:
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('''\
... <dt>Term</dt>
... <dd>
... +420 777 593 531<br/>
... +420 776 593 531<br/>
... +420 775 593 531<br/>
... </dd>
... ''')
>>> dt = soup.dt
>>> [x for x in dt.find_next_sibling('dd').contents if getattr(x, 'name', None) != 'br']
[u'\n +420 777 593 531', u'\n +420 776 593 531', u'\n +420 775 593 531', u'\n']
>>> list(dt.find_next_sibling('dd').stripped_strings)
[u'+420 777 593 531', u'+420 776 593 531', u'+420 775 593 531']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With