When importing a csv file I am getting an error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 15: invalid start byte
traceback:
Traceback (most recent call last):
File "<ipython-input-2-99e71d524b4b>", line 1, in <module>
runfile('C:/AppData/FinRecon/py_code/python3/DataJoin.py', wdir='C:/AppData/FinRecon/py_code/python3')
File "C:\Users\stack\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 786, in runfile
execfile(filename, namespace)
File "C:\Users\stack\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/AppData/FinRecon/py_code/python3/DataJoin.py", line 500, in <module>
M5()
File "C:/AppData/FinRecon/py_code/python3/DataJoin.py", line 221, in M5
s3 = pd.read_csv(working_dir+"S3.csv", sep=",") #encode here encoding='utf-16
File "C:\Users\stack\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\stack\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 435, in _read
data = parser.read(nrows)
File "C:\Users\stack\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1139, in read
ret = self._engine.read(nrows)
File "C:\Users\stack\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1995, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 899, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 914, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 991, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 1123, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas/_libs/parsers.pyx", line 1176, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas/_libs/parsers.pyx", line 1299, in pandas._libs.parsers.TextReader._convert_with_dtype
File "pandas/_libs/parsers.pyx", line 1315, in pandas._libs.parsers.TextReader._string_convert
File "pandas/_libs/parsers.pyx", line 1553, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 15: invalid start byte
What i've tried:
`s3 = pd.read_csv(working_dir+"S3.csv", sep=",", encoding='utf-16')`
I get error UnicodeError: UTF-16 stream does not start with BOM
What can be done to get this file to be read properly?
Try using s3 = pd.read_csv(working_dir+"S3.csv", sep=",", encoding='Latin-1')
Mostly encoding issues arise with the characters within the data. While utf-8 supports all languages according to pandas' documentation, utf-8 has a byte structure that must be respected at all times. Some of the values not included in utf-8 are latin small letters i with diaeresis, right-pointing double angle quotation mark, inverted question mark. This are mapped as 0xef, 0xbb and 0xbf bytes respectively. Hence your error.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With