Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert unicode text to normal text

I am learning Beautiful Soup in Python.

I am trying to parse a simple webpage with list of books.

E.g

<a href="https://www.nostarch.com/carhacking">The Car Hacker’s Handbook</a>

I use the below code.

import requests, bs4
res = requests.get('http://nostarch.com')
res.raise_for_status()
nSoup = bs4.BeautifulSoup(res.text,"html.parser")
elems = nSoup.select('.product-body a')

#elems[0] gives
<a href="https://www.nostarch.com/carhacking">The Car Hacker\u2019s Handbook</a>

And

#elems[0].getText() gives
u'The Car Hacker\u2019s Handbook'

But I want the proper text which is given by,

s = elems[0].getText()
print s
>>>The Car Hacker’s Handbook

How to modify my code in order to give "The Car Hacker’s Handbook" output instead of "u'The Car Hacker\u2019s Handbook'" ?

Kindly help.

like image 574
CS_noob Avatar asked Apr 14 '16 12:04

CS_noob


People also ask

How do I turn my font to normal?

Go to Format > Font > Font. + D to open the Font dialog box. Select the font and size you want to use.

How do I change my Unicode font?

To set your font as the default for a given block of characters, choose Edit > Preferences > Fonts. Then for each encoding you are likely to use, pick the appropriate fonts for the Variable Width and Fixed Width fonts.

How do I decrypt Unicode?

How to decrypt a text with a Unicode cipher? In order make the translation of a Unicode message, reassociate each identifier code its Unicode character. Example: The message 68,67,934,68,8364 is translated by each number: 68 => D , 67 => C , and so on, in order to obtain DCΦD€ .

How do I convert Unicode to ASCII?

You CAN'T convert from Unicode to ASCII. Almost every character in Unicode cannot be expressed in ASCII, and those that can be expressed have exactly the same codepoints in ASCII as in UTF-8, which is probably what you have.


1 Answers

Have you tried using the encode method?

elems[0].getText().encode('utf-8')

More info about unicode and python can be found in https://docs.python.org/2/howto/unicode.html

Moreover, to discover if your string is really utf-8 encoded you can use chardet and run the following command:

>>> import chardet
>>> chardet.detect(elems[0].getText()) 
{'confidence': 0.5, 'encoding': 'utf-8'}
like image 155
mschuh Avatar answered Oct 02 '22 15:10

mschuh