For some reason, Python seems to be having issues with BOM when reading unicode strings from a UTF-8 file. Consider the following: <pre class="prettyprint"><code>with open('test.py') as f: for line in f: print unicode(line, 'utf-8') </code></pre> Seems straightforward, doesn't it? That's what I thought until I ran it from command line and got: <blockquote> UnicodeEncodeError: 'charmap' codec can't encode character u'\ufeff' in position 0: character maps to <code><undefined></code> </blockquote> A brief visitation to Google revealed that BOM has to be cleared manually: <pre class="prettyprint"><code>import codecs with open('test.py') as f: for line in f: print unicode(line.replace(codecs.BOM_UTF8, ''), 'utf-8') </code></pre> This one runs fine. However I'm struggling to see any merit in this. Is there a rationale behind above-described behavior? In contrast, UTF-16 works seamlessly.

The <code>'utf-8-sig'</code> encoding will consume the BOM signature on your behalf.

Why do Python unicode strings require special treatment for UTF-8 BOM?

Tags:

python

io

character-encoding

unicode

utf-8

For some reason, Python seems to be having issues with BOM when reading unicode strings from a UTF-8 file. Consider the following:

with open('test.py') as f:
   for line in f:
      print unicode(line, 'utf-8')

Seems straightforward, doesn't it?

That's what I thought until I ran it from command line and got:

UnicodeEncodeError: 'charmap' codec can't encode character u'\ufeff' in position 0: character maps to <undefined>

A brief visitation to Google revealed that BOM has to be cleared manually:

import codecs
with open('test.py') as f:
   for line in f:
      print unicode(line.replace(codecs.BOM_UTF8, ''), 'utf-8')

This one runs fine. However I'm struggling to see any merit in this.

Is there a rationale behind above-described behavior? In contrast, UTF-16 works seamlessly.

244

asked Sep 01 '11 18:09

Saul

1 Answers

The 'utf-8-sig' encoding will consume the BOM signature on your behalf.

151

answered Oct 22 '22 01:10

Josh Lee

Related questions
                            
                                Python regular expression not matching
                            
                                Python profiling using line_profiler - clever way to remove @profile statements on-the-fly?
                            
                                NumPy append vs Python append
                            
                                Using a custom JSON encoder for SQLAlchemy's PostgreSQL JSONB implementation
                            
                                Python get first and last day of current calendar quarter
                            
                                How can I use both Anaconda versions (2.7 & 3.5)?
                            
                                Python3 'Cannot import name 'cached_property'
                            
                                Python: AttributeError: 'module' object has no attribute 'AddReference'?
                            
                                macOS Python with numpy faster than Julia in training neural network
                            
                                Airflow Python Script with execution_date in op_kwargs
                            
                                How to stop execution of python script in visual studio code?
                            
                                ERROR: Could not build wheels for opencv-python which use PEP 517 and cannot be installed directly
                            
                                Beginner looking for beautiful and instructional Python code [closed]
                            
                                How to concisely cascade through multiple regex statements in Python
                            
                                Change file creation date
                            
                                Can you help me solve this SUDS/SOAP issue?
                            
                                How to slice a 2D Python Array? Fails with: "TypeError: list indices must be integers, not tuple"
                            
                                Add advanced features to a tkinter Text widget
                            
                                How to resolve DNS in Python?
                            
                                How to capture pygame screen?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With