Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

'utf-8' codec can't decode byte 0x89

Tags:

python

csv

I want to read a csv file and process some columns but I keep getting issues. Stuck with the following error:

Traceback (most recent call last):
  File "C:\Users\Sven\Desktop\Python\read csv.py", line 5, in <module>
    for row in reader:
  File "C:\Python34\lib\codecs.py", line 313, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 446: invalid start byte
>>> 

My Code

import csv
with open("c:\\Users\\Sven\\Desktop\\relaties 24112014.csv",newline='', encoding="utf8") as f:
    reader = csv.reader(f,delimiter=';',quotechar='|')
    #print(sum(1 for row in reader))
    for row in reader:
        print(row)
        if row:
            value = row[6]
            value = value.replace('(', '')
            value = value.replace(')', '')
            value = value.replace(' ', '')
            value = value.replace('.', '')
            value = value.replace('0032', '0')
            if len(value) > 0:
                print(value + ' Length: ' + str(len(value)))

I'm a beginner with Python, tried googling, but hard to find the right solution.

Can anyone help me out?

like image 336
Sven Avatar asked Nov 30 '14 23:11

Sven


2 Answers

I was also getting the similar error when trying to read or upload the following kinds of files:

  1. CSV File
  2. JPEG File
  3. PNG File
  4. Zip File

The best way to avoid error like:

  1. 'utf-8' codec can't decode byte 0x89
  2. 'utf-8' codec can't decode byte 0xff

is to read these files as Bytes. When you treat them as byte then you need not provide any encoding value here. So when you open them you should specify:

with open(file_path, 'rb') as file:

Or in your case, the code should be something like:

import csv

with open("c:\\Users\\Sven\\Desktop\\relaties 24112014.csv", newline='', 'rb') as f:

reader = csv.reader(f,delimiter=';',quotechar='|')
like image 144
Rohit Agrawal Avatar answered Oct 25 '22 19:10

Rohit Agrawal


The first byte of a .PNG file is 0x89. Not saying that is your problem, but the .PNG header is specifically designed so that it is NOT accidentally interpreted as text.

Why you would have a .csv file that is actually a .png I don't know. But it definitely could happen if someone accidentally renamed the file. On windows 10 every once and a while I accidentally mass-rename files by accident because of their stupid checkbox feature. Why Microsoft decided desktop machines having identical UI controls to tablets was I good idea... I don't know.

like image 34
twitchdotcom slash KANJICODER Avatar answered Oct 25 '22 20:10

twitchdotcom slash KANJICODER