Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I unescape HTML entities in a string in Python 3.1? [duplicate]

I have looked all around and only found solutions for python 2.6 and earlier, NOTHING on how to do this in python 3.X. (I only have access to Win7 box.)

I HAVE to be able to do this in 3.1 and preferably without external libraries. Currently, I have httplib2 installed and access to command-prompt curl (that's how I'm getting the source code for pages). Unfortunately, curl does not decode html entities, as far as I know, I couldn't find a command to decode it in the documentation.

YES, I've tried to get Beautiful Soup to work, MANY TIMES without success in 3.X. If you could provide EXPLICIT instructions on how to get it to work in python 3 in MS Windows environment, I would be very grateful.

So, to be clear, I need to turn strings like this: Suzy & John into a string like this: "Suzy & John".

like image 683
VolatileRig Avatar asked Mar 02 '10 02:03

VolatileRig


People also ask

How do you unescape HTML entities in Python?

You can use HTMLParser. unescape() from the standard library: For Python 2.6-2.7 it's in HTMLParser. For Python 3 it's in html.

How do you unescape in HTML?

One way to unescape HTML entities is to put our escaped text in a text area. This will unescape the text, so we can return the unescaped text afterward by getting the text from the text area. We have an htmlDecode function that takes an input string as a parameter.


2 Answers

You could use the function html.unescape:

In Python3.4+ (thanks to J.F. Sebastian for the update):

import html html.unescape('Suzy & John') # 'Suzy & John'  html.unescape('"') # '"' 

In Python3.3 or older:

import html.parser     html.parser.HTMLParser().unescape('Suzy & John') 

In Python2:

import HTMLParser HTMLParser.HTMLParser().unescape('Suzy & John') 
like image 113
unutbu Avatar answered Oct 11 '22 21:10

unutbu


You can use xml.sax.saxutils.unescape for this purpose. This module is included in the Python standard library, and is portable between Python 2.x and Python 3.x.

>>> import xml.sax.saxutils as saxutils >>> saxutils.unescape("Suzy & John") 'Suzy & John' 
like image 21
Greg Hewgill Avatar answered Oct 11 '22 21:10

Greg Hewgill