Logo Questions Linux Laravel Mysql Ubuntu Git Menu

IPython Notebook: What is the default encoding?


I have created a package using the encoding utf-8.

When calling a function, it returns a DataFrame, with a column coded in utf-8.

When using IPython at the command line, I don't have any problems showing the content of this table. When using the Notebook, it crashes with the error 'utf8' codec can't decode byte 0xe7. I've attached a full traceback below.

What is the proper encoding to work with Notebook?

UnicodeDecodeError                        Traceback (most recent call last) <ipython-input-13-92c0011919e7> in <module>()       3 ver = verif.VerificacaoNA()       4 comp, total = ver.executarCompRealFisica(DT_INI, DT_FIN) ----> 5 comp  c:\Python27-32\lib\site-packages\ipython-0.13.1-py2.7.egg\IPython\core\displayhook.pyc in __call__(self, result)     240             self.update_user_ns(result)     241             self.log_output(format_dict) --> 242             self.finish_displayhook()     243      244     def flush(self):  c:\Python27-32\lib\site-packages\ipython-0.13.1-py2.7.egg\IPython\zmq\displayhook.pyc in finish_displayhook(self)      59         sys.stdout.flush()      60         sys.stderr.flush() ---> 61         self.session.send(self.pub_socket, self.msg, ident=self.topic)      62         self.msg = None      63   c:\Python27-32\lib\site-packages\ipython-0.13.1-py2.7.egg\IPython\zmq\session.pyc in send(self, stream, msg_or_type, content, parent, ident, buffers, subheader, track, header)     557      558         buffers = [] if buffers is None else buffers --> 559         to_send = self.serialize(msg, ident)     560         flag = 0     561         if buffers:  c:\Python27-32\lib\site-packages\ipython-0.13.1-py2.7.egg\IPython\zmq\session.pyc in serialize(self, msg, ident)     461             content = self.none     462         elif isinstance(content, dict): --> 463             content = self.pack(content)     464         elif isinstance(content, bytes):     465             # content is already packed, as in a relayed message  c:\Python27-32\lib\site-packages\ipython-0.13.1-py2.7.egg\IPython\zmq\session.pyc in <lambda>(obj)      76       77 # ISO8601-ify datetime objects ---> 78 json_packer = lambda obj: jsonapi.dumps(obj, default=date_default)      79 json_unpacker = lambda s: extract_dates(jsonapi.loads(s))      80   c:\Python27-32\lib\site-packages\pyzmq-13.0.0-py2.7-win32.egg\zmq\utils\jsonapi.pyc in dumps(o, **kwargs)      70         kwargs['separators'] = (',', ':')      71  ---> 72     return _squash_unicode(jsonmod.dumps(o, **kwargs))      73       74 def loads(s, **kwargs):  c:\Python27-32\lib\json\__init__.pyc in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, encoding, default, **kw)     236         check_circular=check_circular, allow_nan=allow_nan, indent=indent,     237         separators=separators, encoding=encoding, default=default, --> 238         **kw).encode(obj)     239      240   c:\Python27-32\lib\json\encoder.pyc in encode(self, o)     199         # exceptions aren't as detailed.  The list call should be roughly     200         # equivalent to the PySequence_Fast that ''.join() would do. --> 201         chunks = self.iterencode(o, _one_shot=True)     202         if not isinstance(chunks, (list, tuple)):     203             chunks = list(chunks)  c:\Python27-32\lib\json\encoder.pyc in iterencode(self, o, _one_shot)     262                 self.key_separator, self.item_separator, self.sort_keys,     263                 self.skipkeys, _one_shot) --> 264         return _iterencode(o, 0)     265      266 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,  UnicodeDecodeError: 'utf8' codec can't decode byte 0xe7 in position 199: invalid continuation byte 
like image 886
Adriano Almeida Avatar asked Mar 14 '13 21:03

Adriano Almeida

People also ask

What is default Python encoding?

UTF-8 is one of the most commonly used encodings, and Python often defaults to using it. UTF stands for “Unicode Transformation Format”, and the '8' means that 8-bit values are used in the encoding. (There are also UTF-16 and UTF-32 encodings, but they are less frequently used than UTF-8.)

Is UTF-8 the default encoding?

Show activity on this post. The way I read the spec, UTF-8 is not the default encoding in an XML declaration. It is only the default encoding "for an entity which begins with neither a Byte Order Mark nor an encoding declaration".

What format is Jupyter notebooks stored in?

Jupyter (né IPython) notebook files are simple JSON documents, containing text, source code, rich media output, and metadata. each segment of the document is stored in a cell.

Why is UTF-8 a good choice for the default editor encoding in Python?

As a content author or developer, you should nowadays always choose the UTF-8 character encoding for your content or data. This Unicode encoding is a good choice because you can use a single character encoding to handle any character you are likely to need. This greatly simplifies things.

1 Answers

I had the same problem recently, and indeed setting the default encoding to UTF-8 did the trick:

import sys reload(sys) sys.setdefaultencoding("utf-8") 

Running sys.getdefaultencoding() yielded 'ascii' on my environment (Python 2.7.3), so I guess that's the default.

Also see this related question and Ian Bicking's blog post on the subject.

like image 159
assaflavi Avatar answered Feb 18 '23 22:02
