utf-16 file seeking in python. how?

Tags:

utf-16

For some reason i can not seek my utf16 file. It produces 'UnicodeException: UTF-16 stream does not start with BOM'. My code:

f = codecs.open(ai_file, 'r', 'utf-16')
seek = self.ai_map[self._cbClass.Text]  #seek is valid int
f.seek(seek)
while True:
    ln = f.readline().strip()

I tried random stuff like first reading something from stream, didnt help. I checked offset that is seeked to using hex editor - string starts at character, not null byte (i guess its good sign, right?) So how to seek utf-16 in python?

504

asked Jul 21 '11 16:07

marrat

1 Answers

Well, the error message is telling you why: it's not reading a byte order mark. The byte order mark is at the beginning of the file. Without having read the byte order mark, the UTF-16 decoder can't know what order the bytes are in. Apparently it does this lazily, the first time you read, instead of when you open the file -- or else it is assuming that the seek() is starting a new UTF-16 stream.

If your file doesn't have a BOM, that's definitely the problem and you should specify the byte order when opening the file (see #2 below). Otherwise, I see two potential solutions:

Read the first two bytes of the file to get the BOM before you seek. You seem to say this didn't work, indicating that perhaps it's expecting a fresh UTF-16 stream after the seek, so:
Specify the byte order explicitly by using utf-16-le or utf-16-be as the encoding when you open the file.

answered Sep 19 '22 16:09

kindall

Related questions
                            
                                Automatically generating Python type annotations?
                            
                                plt.show() does nothing when used for the second time
                            
                                Parsing JSON nested Dictionary using Python
                            
                                Plotting using Pandas and datetime format
                            
                                Fill Bounding Boxes in 2D array
                            
                                Robust way to ensure other people can run my python program
                            
                                python3.8 no such file or directory when trying to git commit to bitbucket on mac
                            
                                How do I install an .egg file without easy_install in Windows?
                            
                                How can I get DNS records for a domain in python?
                            
                                How to access a standard-library module in Python when there is a local module with the same name?
                            
                                Changing the hour with datetime.replace() in python
                            
                                Is there a significant overhead by using different versions of sha hashing (hashlib module)
                            
                                Python invalid syntax with "with" statement
                            
                                How to import * with __import__
                            
                                changing python path on mac?
                            
                                python(or numpy) equivalent of match in R
                            
                                Meaning of >> in print statement
                            
                                Should I log before or after an operation?
                            
                                Idiomatic Python for generating a new object from within a class
                            
                                https for localhost:8080

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With