I have trouble reading the csv file by python. My csv file has Korean and numbers. Below is my python code. <pre class="prettyprint"><code>import csv import codecs csvreader = csv.reader(codecs.open('1.csv', 'rU', 'utf-16')) for row in csvreader: print(row) </code></pre> First, there was a UnicodeDecodeError when I enter "for row in csvreader" line in the above code. So I used the code below then the problem seemed to be solved <pre class="prettyprint"><code>csvreader = csv.reader(codecs.open('1.csv', 'rU', 'utf-16')) </code></pre> Then I ran into NULL byte error. Then I can't figure out what's wrong with the csv file. [update] I don't think I changed anything from the previous code but my program shows "UnicodeError: UTF-16 stream does not start with BOM" When I open the csv by excel I can see the table in proper format (image attached at the botton) but when I open it in sublime Text, below is a snippet of what I get. <pre class="prettyprint"><code>504b 0304 1400 0600 0800 0000 2100 6322 f979 7701 0000 d405 0000 1300 0802 5b43 6f6e 7465 6e74 5f54 7970 6573 5d2e 786d 6c20 a204 0228 a000 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 </code></pre> If you need more information about my file, let me know! I appreciate your help. Thanks in advance :) csv file shown in excel <img src="https://i.stack.imgur.com/ufFLM.png" alt="enter image description here"> csv file shown in sublime text <img src="https://i.stack.imgur.com/dsm2b.png" alt="enter image description here">

The problem is that your input file apparently doesn’t start with a BOM (a special character that gets recognizably encoded differently for little-endian vs. big-endian utf-16), so you can’t just use “utf-16” as the encoding, you have to explicitly use “<code>utf-16-le</code>” or “<code>utf-16-be</code>”. If you don’t do that, <code>codecs</code> will guess, and if it guesses wrong, it’ll try to read each code point backward and get illegal values. If your posted sample starts at an even offset and contains a bunch of ASCII, it’s little-ending, so use the -le version. (But of course it’s better to look at what it actually is than to guess.)

UnicodeError: UTF-16 stream does not start with BOM

Tags:

python

error-handling

csv

I have trouble reading the csv file by python. My csv file has Korean and numbers.

Below is my python code.

import csv
import codecs
csvreader = csv.reader(codecs.open('1.csv', 'rU', 'utf-16'))
for row in csvreader:
    print(row)

First, there was a UnicodeDecodeError when I enter "for row in csvreader" line in the above code.

So I used the code below then the problem seemed to be solved

csvreader = csv.reader(codecs.open('1.csv', 'rU', 'utf-16'))

Then I ran into NULL byte error. Then I can't figure out what's wrong with the csv file.

[update] I don't think I changed anything from the previous code but my program shows "UnicodeError: UTF-16 stream does not start with BOM"

When I open the csv by excel I can see the table in proper format (image attached at the botton) but when I open it in sublime Text, below is a snippet of what I get.

504b 0304 1400 0600 0800 0000 2100 6322
f979 7701 0000 d405 0000 1300 0802 5b43
6f6e 7465 6e74 5f54 7970 6573 5d2e 786d
6c20 a204 0228 a000 0200 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000

If you need more information about my file, let me know!

I appreciate your help. Thanks in advance :)

csv file shown in excel

enter image description here

csv file shown in sublime text enter image description here

963

asked Mar 19 '18 20:03

Py11

1 Answers

The problem is that your input file apparently doesn’t start with a BOM (a special character that gets recognizably encoded differently for little-endian vs. big-endian utf-16), so you can’t just use “utf-16” as the encoding, you have to explicitly use “utf-16-le” or “utf-16-be”.

If you don’t do that, codecs will guess, and if it guesses wrong, it’ll try to read each code point backward and get illegal values.

If your posted sample starts at an even offset and contains a bunch of ASCII, it’s little-ending, so use the -le version. (But of course it’s better to look at what it actually is than to guess.)

answered Sep 19 '22 08:09

abarnert

Related questions
                            
                                How to save numpy array into computer for later use in python
                            
                                Remove spaces between numbers in a string in python
                            
                                Pass kwargs into Django Filter
                            
                                PySpark: add a new field to a data frame Row element
                            
                                Why does [-1] not return the last character of the line in a file?
                            
                                What is pythononic way of slicing a set?
                            
                                Python: How to correct misspelled names
                            
                                AttributeError: module 'tensorflow' has no attribute 'InteractiveSession'
                            
                                Scrapy json response convert in utf-8 encode
                            
                                What's the difference between re.DOTALL and re.MULTILINE? [duplicate]
                            
                                PyQt: Why does new window close immediately after opening it
                            
                                Strange error while using Pycharm to debug PyQt gui
                            
                                Use AWS lambda to upload video into S3 with download URL
                            
                                Compare Speed Python3 vs Julia [closed]
                            
                                fatal error: numpy/arrayobject.h: No such file or directory
                            
                                No module named warnings when starting GAE inside virtualenv locally
                            
                                PyPi AssertionError: unsupported schema
                            
                                How to solve the "CASCADE() missing required positional arguments" TypeError
                            
                                Filtering nested list
                            
                                Keras Multiply() layer in functional API

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With