Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

read a binary file (python)

Tags:

python

file

io

I cant read a file, and I dont understand why:

f = open("test/test.pdf", "r")
data = list(f.read())
print data

Returns : []

I would like to open a PDF, and extract every bytes, and put it in a List.

What's wrong with my code ? :(

Thanks,

like image 902
beratch Avatar asked Mar 23 '10 01:03

beratch


People also ask

How do I read a binary file in Python?

The open() function opens a file in text format by default. To open a file in binary format, add 'b' to the mode parameter. Hence the "rb" mode opens the file in binary format for reading, while the "wb" mode opens the file in binary format for writing.

How do I read a large binary file in Python?

To read from a binary file, we need to open it with the mode rb instead of the default mode of rt : >>> with open("exercises. zip", mode="rb") as zip_file: ... contents = zip_file. read() ...


3 Answers

f = open("test/test.pdf", "rb")

You must include the pseudo-mode "b" for binary when reading and writing on Windows. Otherwise the OS silently translates what it considers to be "line endings", causing i/o corruption.

like image 133
Jonathan Feinberg Avatar answered Sep 22 '22 12:09

Jonathan Feinberg


Jonathan is correct that you should be opening the file in binary mode if you are on windows.

However, a PDF file will start with "%PDF-", which would at least be read in regardless of whether you are using binary mode or not.

So it appears to me that your "test/test.pdf" is an empty file

like image 27
John La Rooy Avatar answered Sep 21 '22 12:09

John La Rooy


  • As best as I understand the pdf format, a pdf file shouldn't be a binary file. It should be a text file that may contain lots of binary blobs. I could be wrong.
  • On Windows, if you are opening a binary file, you need to include b in the mode of your file, i.e. open(filename, "rb").
    • On Unix-like systems, the b doesn't hurt anything, though it does not mean anything.
  • Always use a context manager with your files. That is to say, instead of writing f = open("test/test.pdf", "rb"), say with open("test/test.pdf", "r") as f:. This will assure your file always gets closed.
  • list(f.read()) is not likely to be useful code very often. f.read() reaurns a str and calling list on it makes a list of the characters (one-byte strings). This is very seldom needed.
  • Binary or text or whatever, read should work. Are you positive that there is anything in test/test.pdf? Python does not seem to think there is.
like image 29
Mike Graham Avatar answered Sep 21 '22 12:09

Mike Graham