I'm trying to add a ' ' into a Beautifulsoup tag. BS converts the tag.string
to \ 
instead of  
. It has to be some encoding issue but I can't figure it out.
PLEASE NOTE: ignore the back '\' character. I had to add it so stackoverflow would format my question correctly.
import bs4 as Beautifulsoup
html = "<td><span></span></td>"
soup = Beautifulsoup(html)
tag = soup.find("td")
tag.string = " "
Current output is html = "\&amp;nbsp;"
Any ideas?
Creating extra spaces before or after text To create extra spaces before, after, or in-between your text, use the (non-breaking space) extended HTML character.
For example, the common whitespace symbol U+0020 SPACE (also ASCII 32) represents a blank space punctuation character in text, used as a word divider in Western scripts.
(it should have a semi-colon on the end) is an entity for a non-breaking space. Use it between two words that should not have a line break inserted between them by word wrapping.
We can also press ALT+F11 to open it. Now, click on the Insert button and select Module. Write down the following code in the window that appears. Sub Add_Blank_Space () For i = 5 To 14 Range ("D" & i) = Range ("B" & i) & Space (1) & Range ("C" & i) Next End Sub Now, click on the Run.
1 Open an HTML document. You can edit an HTML document using a text editor such as NotePad, or TextEdit on Windows. 2 Press space to add a normal space. To add a regular space, click where you want to add the space and press the spacebar. 3 Type to force an extra space. ... 4 Insert spaces of different widths. ...
Anyway, sometimes is necessary to have more control over the layout of the document; and for this reason in this article is explained how to insert line breaks, page breaks and arbitrary blank spaces. The most standard way how to break lines is to create a new paragraph. This is done by leaving an empty line in the code.
What is Spaces in HTML (Blank Spaces/ Whitespace)? What is HTML Basics–Whitespace? Spaces in HTML can be difficult to understand for the novice web designer, because whether you type 1 space or 100 in your HTML, the web browser automatically collapses those spaces down to just one.
By default BeautifulSoup
uses minimal
output formatter and converts HTML entities.
The solution is to set output formatter to None
, quote from BS source (PageElement
docstring):
# There are five possible values for the "formatter" argument passed in
# to methods like encode() and prettify():
#
# "html" - All Unicode characters with corresponding HTML entities
# are converted to those entities on output.
# "minimal" - Bare ampersands and angle brackets are converted to
# XML entities: & < >
# None - The null formatter. Unicode characters are never
# converted to entities. This is not recommended, but it's
# faster than "minimal".
Example:
from bs4 import BeautifulSoup
html = "<td><span></span></td>"
soup = BeautifulSoup(html, 'html.parser')
tag = soup.find("span")
tag.string = ' '
print soup.prettify(formatter=None)
prints:
<td>
<span>
</span>
</td>
Hope that helps.
Although the answer by alecxe works if you don't mind using formatter=None
, that's not useful if you want to insert an
into some HTML that you do want to have a specific formatting (like "html5"
or "html"
).
I've found that Muposat's suggestion of using "\xa0"
does the trick for me.
So, to adapt alecxe's answer:
from bs4 import BeautifulSoup
html = "<td><span></span></td>"
soup = BeautifulSoup(html, "html.parser")
tag = soup.find("span")
tag.string = "\xa0"
print soup.prettify(formatter="html5")
prints:
<td>
<span>
</span>
</td>
This is using python 3.7.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With