Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert special characters into html entities?

I want to convert, in python, special characters like "%$!&@á é ©" and not only '<&">' as all the documentation and references I've found so far shows. cgi.escape doesn't solve the problem.

For example, the string "á ê ĩ &" should be converted to "&aacute; &ecirc; &itilde; &amp;".

Does anyboy know how to solve it? I'm using python 2.6.

like image 536
Jayme Tosi Neto Avatar asked Mar 08 '12 11:03

Jayme Tosi Neto


People also ask

What converts special characters to HTML entities?

htmlspecialchars() Function: The htmlspecialchars() function is an inbuilt function in PHP which is used to convert all predefined characters to HTML entities.

How do you convert special characters such as and spaces to their respective HTML or URL encoded equivalents?

replace(/&/g, "&amp;"). replace(/>/g, "&gt;"). replace(/</g, "&lt;"). replace(/"/g, "&quot;");

Does HTML support special characters?

Some characters are reserved in HTML and they have special meaning when used in HTML document. For example, you cannot use the greater than and less than signs or angle brackets within your HTML text because the browser will treat them differently and will try to draw a meaning related to HTML tag.


2 Answers

You could build your own loop using the dictionaries you can find in http://docs.python.org/library/htmllib.html#module-htmlentitydefs

The one you're looking for is htmlentitydefs.codepoint2name

like image 132
Ruben Vermeersch Avatar answered Sep 30 '22 17:09

Ruben Vermeersch


I found a built in solution searching for the htmlentitydefs.codepoint2name that @Ruben Vermeersch said in his answer. The solution was found here: http://bytes.com/topic/python/answers/594350-convert-unicode-chars-html-entities

Here's the function:

def htmlescape(text):
    text = (text).decode('utf-8')

    from htmlentitydefs import codepoint2name
    d = dict((unichr(code), u'&%s;' % name) for code,name in codepoint2name.iteritems() if code!=38) # exclude "&"    
    if u"&" in text:
        text = text.replace(u"&", u"&amp;")
    for key, value in d.iteritems():
        if key in text:
            text = text.replace(key, value)
    return text

Thank you all for helping! ;)

like image 45
Jayme Tosi Neto Avatar answered Sep 30 '22 16:09

Jayme Tosi Neto