How do I unescape HTML entities in a string in Python 3.1? [duplicate]

Tags:

I have looked all around and only found solutions for python 2.6 and earlier, NOTHING on how to do this in python 3.X. (I only have access to Win7 box.)

I HAVE to be able to do this in 3.1 and preferably without external libraries. Currently, I have httplib2 installed and access to command-prompt curl (that's how I'm getting the source code for pages). Unfortunately, curl does not decode html entities, as far as I know, I couldn't find a command to decode it in the documentation.

YES, I've tried to get Beautiful Soup to work, MANY TIMES without success in 3.X. If you could provide EXPLICIT instructions on how to get it to work in python 3 in MS Windows environment, I would be very grateful.

So, to be clear, I need to turn strings like this: Suzy & John into a string like this: "Suzy & John".

683

asked Mar 02 '10 02:03

VolatileRig

2 Answers

You could use the function html.unescape:

In Python3.4+ (thanks to J.F. Sebastian for the update):

import html html.unescape('Suzy &amp; John') # 'Suzy & John'  html.unescape('&quot;') # '"'

In Python3.3 or older:

import html.parser     html.parser.HTMLParser().unescape('Suzy &amp; John')

In Python2:

import HTMLParser HTMLParser.HTMLParser().unescape('Suzy &amp; John')

113

answered Oct 11 '22 21:10

unutbu

You can use xml.sax.saxutils.unescape for this purpose. This module is included in the Python standard library, and is portable between Python 2.x and Python 3.x.

>>> import xml.sax.saxutils as saxutils >>> saxutils.unescape("Suzy &amp; John") 'Suzy & John'

answered Oct 11 '22 21:10

Greg Hewgill

Related questions
                            
                                NameError: name 'List' is not defined
                            
                                How to join on multiple columns in Pyspark?
                            
                                Why is 'a' in ('abc') True while 'a' in ['abc'] is False?
                            
                                TextField missing in django.forms
                            
                                Can't open lib 'ODBC Driver 13 for SQL Server'? Sym linking issue?
                            
                                Docker-compose and pdb
                            
                                How to get more than 1000 objects from S3 by using list_objects_v2?
                            
                                Finding duplicate files and removing them
                            
                                How would you do the equivalent of preprocessor directives in Python?
                            
                                shuffle string in python
                            
                                TypeError: get() takes no keyword arguments
                            
                                How do I access (read, write) Google Sheets spreadsheets with Python?
                            
                                Python check if website exists
                            
                                Read from File, or STDIN
                            
                                Python Pandas : pivot table with aggfunc = count unique distinct
                            
                                Pandas: Subtracting two date columns and the result being an integer
                            
                                Execute a file with arguments in Python shell
                            
                                Plotting histograms from grouped data in a pandas DataFrame
                            
                                Why doesn't zeromq work on localhost?
                            
                                Python import coding style

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I unescape HTML entities in a string in Python 3.1? [duplicate]

Tags:

python

html

curl

python-3.x

entities

VolatileRig

People also ask

2 Answers

unutbu

Greg Hewgill

Recent Activity

Donate For Us