Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python binary EOF

Tags:

python

binary

eof

I want to read through a binary file. Googling "python binary eof" led me here.

Now, the questions:

  1. Why does the container (x in the SO answer) contain not a single (current) byte but a whole bunch of them? What am I doing wrong?
  2. If it should be so and I am doing nothing wrong, HOW do read a single byte? I mean, is there any way to detect EOF while reading the file with read(1) method?
like image 569
mekkanizer Avatar asked Aug 23 '14 19:08

mekkanizer


People also ask

What is binary EOF?

EOF , just to be clear is "End of File".

Does binary file have EOF character?

There is no such special character in the binary file to signal EOF.

Is there an EOF in Python?

EOF stands for End of File in Python. Unexpected EOF implies that the interpreter has reached the end of our program before executing all the code. This error is likely to occur when: we fail to declare a statement for loop ( while / for )


1 Answers

To quote the documentation:

file.read([size])

Read at most size bytes from the file (less if the read hits EOF before obtaining size bytes). If the size argument is negative or omitted, read all data until EOF is reached. The bytes are returned as a string object. An empty string is returned when EOF is encountered immediately. (For certain files, like ttys, it makes sense to continue reading after an EOF is hit.) Note that this method may call the underlying C function fread() more than once in an effort to acquire as close to size bytes as possible. Also note that when in non-blocking mode, less data than was requested may be returned, even if no size parameter was given.

That means (for a regular file):

  • f.read(1) will return a byte object containing either 1 byte or 0 byte is EOF was reached
  • f.read(2) will return a byte object containing either 2 bytes, or 1 byte if EOF is reached after the first byte, or 0 byte if EOF in encountered immediately.
  • ...

If you want to read your file one byte at a time, you will have to read(1) in a loop and test for "emptiness" of the result:

# From answer by @Daniel
with open(filename, 'rb') as f:
    while True:
        b = f.read(1)
        if not b:
            # eof
            break
        do_something(b)

If you want to read your file by "chunk" of say 50 bytes at a time, you will have to read(50) in a loop:

with open(filename, 'rb') as f:
    while True:
        b = f.read(50)
        if not b:
            # eof
            break
        do_something(b) # <- be prepared to handle a last chunk of length < 50
                        #    if the file length *is not* a multiple of 50

In fact, you may even break one iteration sooner:

with open(filename, 'rb') as f:
    while True:
        b = f.read(50)
        do_something(b) # <- be prepared to handle a last chunk of size 0
                        #    if the file length *is* a multiple of 50
                        #    (incl. 0 byte-length file!)
                        #    and be prepared to handle a last chunk of length < 50
                        #    if the file length *is not* a multiple of 50
        if len(b) < 50:
            break

Concerning the other part of your question:

Why does the container [..] contain [..] a whole bunch of them [bytes]?

Referring to that code:

for x in file:  
   i=i+1  
   print(x)  

To quote again the doc:

A file object is its own iterator, [..]. When a file is used as an iterator, typically in a for loop (for example, for line in f: print line.strip()), the next() method is called repeatedly. This method returns the next input line, or raises StopIteration when EOF is hit when the file is open for reading (behavior is undefined when the file is open for writing).

The the code above read a binary file line-by-line. That is stopping at each occurrence of the EOL char (\n). Usually, that leads to chunks of various length as most binary files contains occurrences of that char randomly distributed.

I wouldn't encourage you to read a binary file that way. Please prefer one a solution based on read(size).

like image 119
Sylvain Leroux Avatar answered Nov 15 '22 20:11

Sylvain Leroux