I 've generated a huge (6G) txt file using a windows command line program (samtools.exe):
.\samtools.exe mpileup -O bamfile.bam > txtfile.tsv
The generated file is actually a table separated by tab. When I tried to use pandas.read_table to open it, it gives me:UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
When I tried to print the first line of the file, it is like this:ÿþAL645882 473 N 1 ^!c I 1
Everything is normal except the first character. If I read it use 'rb'
, indeed the first character is 0xff
.
I really want this table to be read as a pandas DataFrame, the file is huge, is there anyway I can let python ignore the 0xff
byte? Or simply delete the byte in the file?
Thanks in advance!
That looks like a UTF-16 BOM header being misinterpreted:
In [25]: with open("tmp.csv", "wb") as fp:
...: fp.write("a,b\n1,2".encode("utf-16"))
...:
In [26]: open("tmp.csv", "rb").read().decode("latin-1")
Out[26]: 'ÿþa\x00,\x00b\x00\n\x001\x00,\x002\x00'
In [27]: print(open("tmp.csv", "rb").read().decode("latin-1"))
ÿþa,b
1,2
So you could try interpreting it as UTF-16:
In [29]: pd.read_csv("tmp.csv")
[...]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
In [30]: pd.read_csv("tmp.csv", encoding='utf-16')
Out[30]:
a b
0 1 2
(There are other hacks you could do if it really was only the first two bytes which were causing problems, such as opening a file pointer and reading two bytes, but I suspect as in the above example there are null bytes in the file that aren't immediately obvious, and so it's best to use the right encoding instead.)
It could work for windows7 spyder3.6
data=pd.read_csv("C:/Users/Manjeesh/all_state_cancer.csv",encoding='iso-8859-1')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x85 in position 607: invalid start byte
Result:
:data=pd.read_csv("C:/Users/Manjeesh/all_state_cancer.csv",encoding='iso-8859-1')
data
Out[207]:
s.no user.location \
0 1 Ahmedabad
1 2 Madhya Pradesh, India
2 3 Shahdol (MP)
3 4 Shahdol (MP)
4 5 Ahmedabad
5 6 Bengaluru, India
6 7 Madhya Pradesh, India
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With