Reading UTF-8 Encoded Files and Text Files in Python3

Tags:

Ok, so python3 and unicode. I know that all python3 strings are actually unicode strings and all python3 code is stored as utf-8. But how does python3 reads text files? Does it assume that they are encoded in utf-8? Do I need to call decode('utf-8') when reading a text file? What about pandas read_csv() and to_csv()?

398

asked Dec 22 '17 23:12

Bella Dubrov

1 Answers

Python's built-in function open() has an optional parameter encoding:

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.

Analogous parameter could be found in pandas:

pandas.read_csv(): encoding: str, default None. Encoding to use for UTF when reading/writing (ex. ‘utf-8’).
Series.to_csv(): encoding: string, optional. A string representing the encoding to use if the contents are non-ascii, for python versions prior to 3.
DataFrame.to_csv(): encoding: string, optional. A string representing the encoding to use in the output file, defaults to ‘ascii’ on Python 2 and ‘utf-8’ on Python 3.

193

answered Dec 20 '22 19:12

JosefZ

Related questions
                            
                                namedtuple with unicode string as name
                            
                                Fatal error when trying to install PyCrypto on OS X El Capitan
                            
                                Convert own image to MNIST's image
                            
                                typing.NamedTuple and PyCharm
                            
                                Overwriting/clearing previous console line
                            
                                Embedding multiple bokeh HTML plots into flask
                            
                                Python property on a list
                            
                                Python 3.5 TypeError: got multiple values for argument [duplicate]
                            
                                Python how to get the calling function (not just its name)?
                            
                                Why are mutable values allowed in Python Enums?
                            
                                Way to quit the most outer function from an inner function?
                            
                                PyMySQL throws 'BrokenPipeError' after making frequent reads
                            
                                How to annotate Python function using return type of another function?
                            
                                Prime numbers generator explanation? [duplicate]
                            
                                Python 3 hash HMAC-SHA512 [duplicate]
                            
                                Swaping two elements in a list shows unexpected behaviour
                            
                                How to run python3 on google's dataproc pyspark
                            
                                Python pytest cases for async and await method
                            
                                How to use Python 3 with Google App Engine's Local Development Server
                            
                                How to count correctly letters with diacritics in text?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reading UTF-8 Encoded Files and Text Files in Python3

Tags:

python-3.x

unicode

utf-8

Bella Dubrov

People also ask

1 Answers

JosefZ

Recent Activity

Donate For Us