Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to insert a blank space(&nbsp) into a Beautifulsoup tag?

I'm trying to add a '&nbsp' into a Beautifulsoup tag. BS converts the tag.string to \&ampamp;nbsp; instead of &nbsp. It has to be some encoding issue but I can't figure it out.

PLEASE NOTE: ignore the back '\' character. I had to add it so stackoverflow would format my question correctly.

import bs4 as Beautifulsoup

html = "<td><span></span></td>"
soup = Beautifulsoup(html)
tag = soup.find("td")
tag.string = "&nbsp;"

Current output is html = "\&ampamp;nbsp;"

Any ideas?

like image 674
fat fantasma Avatar asked Mar 05 '14 02:03

fat fantasma


People also ask

How do I add extra blank space in HTML?

Creating extra spaces before or after text To create extra spaces before, after, or in-between your text, use the &nbsp; (non-breaking space) extended HTML character.

What is the code for blank space?

For example, the common whitespace symbol U+0020 SPACE (also ASCII 32) represents a blank space punctuation character in text, used as a word divider in Western scripts.

How do I use &NBSP?

&nbsp; (it should have a semi-colon on the end) is an entity for a non-breaking space. Use it between two words that should not have a line break inserted between them by word wrapping.

How to add blank space in Excel using sub?

We can also press ALT+F11 to open it. Now, click on the Insert button and select Module. Write down the following code in the window that appears. Sub Add_Blank_Space () For i = 5 To 14 Range ("D" & i) = Range ("B" & i) & Space (1) & Range ("C" & i) Next End Sub Now, click on the Run.

How to insert space in HTML?

1 Open an HTML document. You can edit an HTML document using a text editor such as NotePad, or TextEdit on Windows. 2 Press space to add a normal space. To add a regular space, click where you want to add the space and press the spacebar. 3 Type to force an extra space. ... 4 Insert spaces of different widths. ...

How to insert line breaks and blank spaces?

Anyway, sometimes is necessary to have more control over the layout of the document; and for this reason in this article is explained how to insert line breaks, page breaks and arbitrary blank spaces. The most standard way how to break lines is to create a new paragraph. This is done by leaving an empty line in the code.

What is spaces in HTML (blank spaces/whitespace)?

What is Spaces in HTML (Blank Spaces/ Whitespace)? What is HTML Basics–Whitespace? Spaces in HTML can be difficult to understand for the novice web designer, because whether you type 1 space or 100 in your HTML, the web browser automatically collapses those spaces down to just one.


2 Answers

By default BeautifulSoup uses minimal output formatter and converts HTML entities.

The solution is to set output formatter to None, quote from BS source (PageElement docstring):

# There are five possible values for the "formatter" argument passed in
# to methods like encode() and prettify():
#
# "html" - All Unicode characters with corresponding HTML entities
#   are converted to those entities on output.
# "minimal" - Bare ampersands and angle brackets are converted to
#   XML entities: &amp; &lt; &gt;
# None - The null formatter. Unicode characters are never
#   converted to entities.  This is not recommended, but it's
#   faster than "minimal".

Example:

from bs4 import BeautifulSoup


html = "<td><span></span></td>"
soup = BeautifulSoup(html, 'html.parser')
tag = soup.find("span")
tag.string = '&nbsp;'

print soup.prettify(formatter=None)

prints:

<td>
 <span>
  &nbsp;
 </span>
</td>

Hope that helps.

like image 152
alecxe Avatar answered Oct 19 '22 10:10

alecxe


Although the answer by alecxe works if you don't mind using formatter=None, that's not useful if you want to insert an &nbsp; into some HTML that you do want to have a specific formatting (like "html5" or "html").

I've found that Muposat's suggestion of using "\xa0" does the trick for me.

So, to adapt alecxe's answer:

from bs4 import BeautifulSoup

html = "<td><span></span></td>"
soup = BeautifulSoup(html, "html.parser")
tag = soup.find("span")
tag.string = "\xa0"

print soup.prettify(formatter="html5")

prints:

<td>
 <span>
  &nbsp;
 </span>
</td>

This is using python 3.7.

like image 44
Phil Gyford Avatar answered Oct 19 '22 11:10

Phil Gyford