I am using BeautifulSoup to parse some content from a html page. I can extract from the html the content I want (i.e. the text contained in a <code>span</code> defined by the <code>class</code> myclass). <pre class="prettyprint"><code>result = mycontent.find(attrs={'class':'myclass'}) </code></pre> I obtain this result: <pre class="prettyprint"><code>Lorem ipsum dolor sit amet, consectetur... </code></pre> If I try to extract the text using: <pre class="prettyprint"><code>result.get_text() </code></pre> I obtain: <pre class="prettyprint"><code>Lorem ipsumdolor sit amet,consectetur... </code></pre> As you can see when the tag <code> </code> is removed there is no more spacing between the contents and two words are concated. How can I solve this issue?

If you are using bs4 you can use <code>strings</code>: <pre class="prettyprint"><code>" ".join(result.strings) </code></pre>

Suggestions on get_text() in BeautifulSoup

Tags:

python

beautifulsoup

I am using BeautifulSoup to parse some content from a html page.

I can extract from the html the content I want (i.e. the text contained in a span defined by the class myclass).

result = mycontent.find(attrs={'class':'myclass'})

I obtain this result:

<span class="myclass">Lorem ipsum<br/>dolor sit amet,<br/>consectetur...</span>

If I try to extract the text using:

result.get_text()

I obtain:

Lorem ipsumdolor sit amet,consectetur...

As you can see when the tag   is removed there is no more spacing between the contents and two words are concated.

How can I solve this issue?

440

asked Apr 20 '13 13:04

user601836

1 Answers

If you are using bs4 you can use strings:

" ".join(result.strings)

133

answered Sep 25 '22 11:09

Sean Vieira

Related questions
                            
                                Why ruby 1.9 is faster than python 2.7 and 3.2? [closed]
                            
                                Get (year,month) for the last X months
                            
                                How can I un-shorten a URL using python?
                            
                                Could not import settings 'myproject.settings' (Is it on sys.path?): No module named pinax
                            
                                Sorting a list in Python using the result from sorting another list [duplicate]
                            
                                Sorting the letters of a one worded string in Python?
                            
                                get nth line of string in python
                            
                                Python Arpabet phonetic transcription
                            
                                Assign split values to multiple variables
                            
                                How to choose a random line from a text file
                            
                                python tuple is immutable - so why can I add elements to it
                            
                                How to fix broken utf-8 encoding in Python?
                            
                                Insert a string before a substring of a string
                            
                                ipaddress module ValueError('%s has host bits set' % self)
                            
                                unix vim Error detected while processing BufRead Auto commands
                            
                                Python .sort() not working as expected
                            
                                Algorithm (Python): find the smallest number greater than k
                            
                                Creating Python daemon - 'module' object has no attribute 'DaemonContext'
                            
                                abbreviating a double comparison in python
                            
                                Creating a dictionary with same values [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With