Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3: reading UCS-2 (BE) file

I can't seem to be able to decode UCS-2 BE files (legacy stuff) under Python 3.3, using the built-in open() function (stack trace shows UnicodeDecodeError and contains my readLine() method) - in fact, I wasn't able to find a flag for specifying this encoding.

Using Windows 8, terminal is set to codepage 65001, using 'Lucida Console' fonts.

Code snippet won't be of too much help, I guess:

def display_resource():
    f = open(r'D:\workspace\resources\JP.res', encoding=<??tried_several??>)
    while True:
        line = f.readline()
        if len(line) == 0:
            break

Appreciating any insight into this issue.

like image 621
elder elder Avatar asked Jan 23 '13 20:01

elder elder


1 Answers

UCS-2 is UTF-16, really, for any codepoint that was assigned when it was still called UCS-2 in any case.

Open it with encoding='utf16'. If there is no BOM (the Byte order mark, 2 bytes at the start, for BE that'd be \xfe\xff), then use encoding='utf_16_be' to force a byte order.

like image 179
Martijn Pieters Avatar answered Sep 21 '22 02:09

Martijn Pieters