Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python UnicodeDecodeError

I am writing a Python program to read in a DOS tree command outputted into a text document. When I reach the 533th iteration of the loop, Eclipse gives an error:

Traceback (most recent call last):
  File "E:\Peter\Documents\Eclipse Workspace\MusicManagement\InputTest.py", line 24, in  <module>
    input = myfile.readline()
  File "C:\Python33\lib\encodings\cp1252.py", line 23, in decode
   return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3551: character maps  to undefined

I have read other posts, and setting the encoding to latin-1 does not resolve this issue, as it returns a UnicodeDecodeError on another character, and the same with trying to use utf-8.

The following is the code:

import os
from Album import *

os.system("tree F:\\Music > tree.txt")

myfile = open('tree.txt')
myfile.readline()
myfile.readline()
myfile.readline()

albums = []
x = 0

while x < 533:
    if not input: break
    input = myfile.readline()
    if len(input) < 14:
        artist = input[4:-1]
    elif input[13] != '-':
        artist = input[4:-1]
    else:
        albums.append(Album(artist, input[15:-1], input[8:12]))
    x += 1

for x in albums:
    print(x.artist + ' - ' + x.title + ' (' + str(x.year) + ')')
like image 547
pbecker13 Avatar asked Jan 31 '13 21:01

pbecker13


1 Answers

You need to figure out what encoding tree.com used; according to this post that could any of the MS-DOS codepages.

You could go through each of the MS-DOS encodings; most of those have a codec in the python standard library. I'd try cp437 and cp500 first; the latter is the MS-DOS predecessor of cp1252 I think.

Pass the encoding to open():

myfile = open('tree.txt', encoding='cp437')

You really should look into using os.walk() instead of using tree.com for this task though, it'll save you having to deal with issues like these at least.

like image 113
Martijn Pieters Avatar answered Oct 26 '22 11:10

Martijn Pieters