Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python 3.0 open() default encoding

I am trying to count the lines in a JSON file. Click HERE to access my JSON file .

I tried to use the below code to count the lines.

input = open("json/world_bank.json") i=0 for l in input:     i+=1 print(i) 

But the above code is throwing a UniCodeDecode Error as shown below.

--------------------------------------------------------------------------- UnicodeDecodeError                        Traceback (most recent call last) <ipython-input-17-edc88ade7225> in <module>()       2        3 i=0 ----> 4 for l in input:       5     i+=1       6   C:\Users\Subbi Reddy\AppData\Local\Continuum\Anaconda3\lib\encodings\cp1252.py in decode(self, input, final)      21 class IncrementalDecoder(codecs.IncrementalDecoder):      22     def decode(self, input, final=False): ---> 23         return codecs.charmap_decode(input,self.errors,decoding_table)[0]      24       25 class StreamWriter(Codec,codecs.StreamWriter):  UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3979: character maps to <undefined> 

Then i included encoding parameter in open function as shown below.

input = open("json/world_bank.json",encoding="utf8") 

Then it started working and giving output as 500.

As far as i know python open should consider "utf8" as default encoding.

Where i am going wrong in here.

like image 899
Subbi reddy dwarampudi Avatar asked Mar 30 '16 08:03

Subbi reddy dwarampudi


People also ask

What is the default encoding of open Python?

UTF-8 is one of the most commonly used encodings, and Python often defaults to using it. UTF stands for “Unicode Transformation Format”, and the '8' means that 8-bit values are used in the encoding.

How do I change the default encoding in Python 3?

setdefaultencoding() is purposely removed from sys when Python starts. Reenabling it and changing the default encoding can break code that relies on ASCII being the default (this code can be third-party, which would generally make fixing it impossible or dangerous).

What encoding does Python 3 use?

String Encoding Since Python 3.0, strings are stored as Unicode, i.e. each character in the string is represented by a code point. So, each string is just a sequence of Unicode code points. For efficient storage of these strings, the sequence of code points is converted into a set of bytes.

What is the default encoding for bytes decode () in Python 3?

Python bytes decode() function is used to convert bytes to string object. Both these functions allow us to specify the error handling scheme to use for encoding/decoding errors. The default is 'strict' meaning that encoding errors raise a UnicodeEncodeError.


1 Answers

The default UTF-8 encoding of Python 3 only extends to byte->str conversions. open() instead uses your environment to choose an appropriate encoding:

From the Python 3 docs for open():

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.

In your case, as you're on Windows with a Western Europe/North America, you will be given the 8bit Windows-1252 character set. Setting encoding to utf-8 overrides this.

like image 87
Alastair McCormack Avatar answered Sep 28 '22 18:09

Alastair McCormack