'utf-8' codec can't decode byte 0xa0 in position 4276: invalid start byte

Question

I try to read and print the following file: txt.tsv (https://www.sec.gov/files/dera/data/financial-statement-and-notes-data-sets/2017q3_notes.zip)

According to the SEC the data set is provided in a single encoding, as follows:

Tab Delimited Value (.txt): utf-8, tab-delimited, - terminated lines, with the first line containing the field names in lowercase.

My current code:

import csv

with open('txt.tsv') as tsvfile:
    reader = csv.DictReader(tsvfile, dialect='excel-tab')
    for row in reader:
        print(row)

All attempts ended with the following error message:

'utf-8' codec can't decode byte 0xa0 in position 4276: invalid start byte

I am a bit lost. Can anyone help me? Many thanks in advance.

koPytok · Accepted Answer

Encoding in the file is 'windows-1252'. Use:

open('txt.tsv', encoding='windows-1252')

Hasim D · Answer

If someone works on Turkish data, then I suggest this line:

df = pd.read_csv("text.txt",encoding='windows-1254')

raj kumar · Answer

ds = pd.read_csv('/Dataset/test.csv', encoding='windows-1252')

Works fine for me, thanks.

Ghulam Dastgeer · Answer

i have the same error message for .csv file, and This Worked for me :

     df = pd.read_csv('Text.csv',encoding='ANSI')

Suresh Gautam · Answer

I also encountered the same issue and worked while using latin1 encoding, refer to the sample code to apply in your codebase. Give a try if above resolution doesn't work.

df=pd.read_csv("../CSV_FILE.csv",na_values=missing, encoding='latin1')

'utf-8' codec can't decode byte 0xa0 in position 4276: invalid start byte

Tags:

python

csv

encoding

utf-8

Vital

5 Answers

koPytok

Hasim D

raj kumar

Ghulam Dastgeer

Suresh Gautam

Recent Activity

Donate For Us

'utf-8' codec can't decode byte 0xa0 in position 4276: invalid start byte

Tags:

python

csv

encoding

utf-8

Vital

5 Answers

koPytok

Hasim D

raj kumar

Ghulam Dastgeer

Suresh Gautam

Related questions

Recent Activity

Donate For Us