Weird problem with input encoding in IPython

Tags:

I'm running python 2.6 with latest IPython on Windows XP SP3, and I have two questions. First one of my problems is, when under IPython, I cannot input Unicode strings directly, and, as a result, cannot open files with non-latin names. Let me demonstrate. Under usual python this works:

>>> sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'mbcs'
>>> fd = open(u'm:/Блокнот/home.tdl')
>>> print u'm:/Блокнот/home.tdl'
m:/Блокнот/home.tdl
>>>

It's cyrillic in there, by the way. And under the IPython I get:

In [49]: sys.getdefaultencoding()
Out[49]: 'ascii'

In [50]: sys.getfilesystemencoding()
Out[50]: 'mbcs'

In [52]: fd = open(u'm:/Блокнот/home.tdl')
---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)

C:\Documents and Settings\andrey\<ipython console> in <module>()

IOError: [Errno 2] No such file or directory: u'm:/\x81\xab\xae\xaa\xad\xae\xe2/home.tdl'

In [53]: print u'm:/Блокнот/home.tdl'
-------------->print(u'm:/Блокнот/home.tdl')
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (15, 0))

---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)

C:\Documents and Settings\andrey\<ipython console> in <module>()

C:\Program Files\Python26\lib\encodings\cp866.pyc in encode(self, input, errors)
     10
     11     def encode(self,input,errors='strict'):
---> 12         return codecs.charmap_encode(input,errors,encoding_map)
     13
     14     def decode(self,input,errors='strict'):

UnicodeEncodeError: 'charmap' codec can't encode characters in position 3-9: character maps to <und

In [54]:

The second problem is less frustrating, but still. When I try to open a file, and specify file name argument as non-unicode string, it does not open. I have to forcibly decode string from OEM charset, before I could open files, which is pretty inconvenient:

>>> fd2 = open('m:/Блокнот/home.tdl'.decode('cp866'))
>>>

Maybe it has something to with my regional settings, I don't know, because I can't even cut-and-paste cyrillic text from console. I've put "Russian" everywhere in regional settings, but it does not seem to work.

359

asked Feb 14 '10 10:02

Andrey Balaguta

2 Answers

Yes. Typing Unicode at the console is always problematic and generally best avoided, but IPython is particularly broke. It converts characters you type on its console as if they were encoded in ISO-8859-1, regardless of the actual encoding you're giving it.

For now, you'll have to say u'm:/\u0411\u043b\u043e\u043a\u043d\u043e\u0442/home.tdl'.

answered Oct 10 '22 22:10

bobince

Perversely enough, this will work:

fd = open('m:/Блокнот/home.tdl')

Or:

fd = open('m:/Блокнот/home.tdl'.encode('utf-8'))

This gets around ipython's bug by inputting the string as a raw UTF-8 encoded byte-string. ipython doesn't try any funny business with it. You're then free to encode it into a unicode string if you like, and get on with your life.

answered Oct 10 '22 20:10

David Eyk

Related questions
                            
                                Get previous business day in a DataFrame
                            
                                could not use tqdm_notebook in notebook
                            
                                Does mypy have a Subclass-Acceptable Return Type?
                            
                                Coverage badge in Gitlab CI with Python coverage always unknown
                            
                                Why doesn't pandas reindex() operate in-place?
                            
                                Plotly express is not rendered in jupyter lab
                            
                                Should setuptools be in the setup_requires entry of setup.cfg files?
                            
                                Executing a script that is loading libcrypto in an unsafe way on macOS 10.15.1
                            
                                Slow pandas DataFrame MultiIndex reindex
                            
                                IDE breakpoint in TensorFlow Dataset API mapped py_function?
                            
                                AssertionError: Could not compute output Tensor
                            
                                Pip installation stuck in infinite loop if unresolvable conflicts in dependencies
                            
                                RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! when resuming training
                            
                                Can you achieve a case insensitive 'unique' constraint in Sqlite3 (with Django)?
                            
                                How to empty a socket in python?
                            
                                Django Model Inheritance And Foreign Keys
                            
                                Changing the default indentation of etree.tostring in lxml
                            
                                Most used Python module for video processing? [closed]
                            
                                How should Django Apps bundle static media?
                            
                                Which methods implement the buffer interface in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Weird problem with input encoding in IPython

Tags:

python

filesystems

windows

ipython

locale

Andrey Balaguta

People also ask

2 Answers

bobince

David Eyk

Recent Activity

Donate For Us