Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using unicode character u201c

I'm a new to python and am having problems understand unicode. I'm using Python 3.4. I've spent an entire day trying to figure this out by reading about unicode including http://www.fileformat.info/info/unicode/char/201C/index.htm and http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html

I need to refer to special quotes because they are used in the text I'm analyzing. I did test that the W7 command window can read and write the 2 special quote characters. To make things simple, I wrote a one line script:

print ('“') # that's the special quote mark in between normal single quotes

and get this output:

Traceback (most recent call last):
  File "C:\Users\David\Documents\Python34\Scripts\wordCount3.py", line 1, in <module>
    print ('\u201c')
  File "C:\Python34\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u201c' in position 0: character maps to <undefined>

So how do I write something to refer to these two characters u201C and u201D?

Is this the correct encoding choice in the file open statement?

with open(fileIn, mode='r', encoding='utf-8', errors='replace') as f:
like image 995
David Q Avatar asked Oct 31 '22 09:10

David Q


1 Answers

The reason is that in 3.x Python You can't just mix unicode strings with byte strings. Probably, You've read the manuals dealing with Python 2.x where such things are possible as long as bytestring contains convertable chars.

print('\u201c', '\u201d')

works fine for me, so the only reason is that you're using wrong encoding for source file or terminal.

Also You may explicitly point python to codepage you're using, by throwing the next line ontop of your source:

 # -*- coding: utf-8 -*-

Added: it seems that You're working on Windows machine, if so you could change Your console codepage to utf-8 by running

chcp 65001

before You fire up your python interpreter. That changes would be temporary, and if You want permanent, run the next .reg file:

Windows Registry Editor Version 5.00
[HKEY_CURRENT_USER\Console]
"CodePage"=dword:fde9
like image 64
thodnev Avatar answered Nov 09 '22 22:11

thodnev