Is it possible to split a text from a tag by br tags? I have this tag contents: <code>[u'+420 777 593 531', , u'+420 776 593 531', , u'+420 775 593 531']</code> And I want to get only numbers. Any advices? EDIT: <pre class="prettyprint"><code>[x for x in dt.find_next_sibling('dd').contents if x!=' '] </code></pre> Does not work at all.

You need to test for tags, which are modelled as <code>Element</code> instances. <code>Element</code> objects have a <code>name</code> attribute, while text elements don't (which are <code>NavigableText</code> instances): <pre class="prettyprint"><code>[x for x in dt.find_next_sibling('dd').contents if getattr(x, 'name', None) != 'br'] </code></pre> Since you appear to only have text and <code> </code> elements in that <code><dd></code> element, you may as well just get all the contained strings instead: <pre class="prettyprint"><code>list(dt.find_next_sibling('dd').stripped_strings) </code></pre> Demo: <pre class="prettyprint"><code>>>> from bs4 import BeautifulSoup >>> soup = BeautifulSoup('''\ ... <dt>Term</dt> ... <dd> ... +420 777 593 531 ... +420 776 593 531 ... +420 775 593 531 ... </dd> ... ''') >>> dt = soup.dt >>> [x for x in dt.find_next_sibling('dd').contents if getattr(x, 'name', None) != 'br'] [u'\n +420 777 593 531', u'\n +420 776 593 531', u'\n +420 775 593 531', u'\n'] >>> list(dt.find_next_sibling('dd').stripped_strings) [u'+420 777 593 531', u'+420 776 593 531', u'+420 775 593 531'] </code></pre>

Beautifulsoup split text in tag by

Tags:

python

text

newline

beautifulsoup

Milano

1 Answers

You need to test for tags, which are modelled as Element instances. Element objects have a name attribute, while text elements don't (which are NavigableText instances):

[x for x in dt.find_next_sibling('dd').contents if getattr(x, 'name', None) != 'br']

Since you appear to only have text and   elements in that <dd> element, you may as well just get all the contained strings instead:

list(dt.find_next_sibling('dd').stripped_strings)

Demo:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('''\
... <dt>Term</dt>
... <dd>
...     +420 777 593 531<br/>
...     +420 776 593 531<br/>
...     +420 775 593 531<br/>
... </dd>
... ''')
>>> dt = soup.dt
>>> [x for x in dt.find_next_sibling('dd').contents if getattr(x, 'name', None) != 'br']
[u'\n    +420 777 593 531', u'\n    +420 776 593 531', u'\n    +420 775 593 531', u'\n']
>>> list(dt.find_next_sibling('dd').stripped_strings)
[u'+420 777 593 531', u'+420 776 593 531', u'+420 775 593 531']

139

answered Sep 19 '22 22:09

Martijn Pieters

Related questions
                            
                                Read sparse matrix in python
                            
                                Pymongo using $exists
                            
                                python make RGB image from 3 float32 numpy arrays
                            
                                Plot multiple boxplot in one graph in pandas or matplotlib?
                            
                                AttributeError: 'Pool' object has no attribute '__exit__'
                            
                                Python QuickSort maximum recursion depth
                            
                                Printing lists in python without spaces
                            
                                Python: How to find two equal/closest values between two separate arrays?
                            
                                Sympy Simplification with Square Root
                            
                                How to convert a dictionary into a flat list?
                            
                                selenium move_to_element does not always mouse-hover
                            
                                Python: Munging data with '.join' (TypeError: sequence item 0: expected string, tuple found)
                            
                                How do I inspect one specific object in IPython
                            
                                Visualize Optical Flow with color model
                            
                                Convert Bitstring (String of 1 and 0s) to numpy array
                            
                                Django: extending user model vs creating user profile model
                            
                                '400 Bad Request' when post json in Flask
                            
                                Python pandas summary table plot
                            
                                How to set bandwidth on Mininet custom topology?
                            
                                Serialize Objects with One-to-One Relationship Django

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Beautifulsoup split text in tag by <br/>