Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: File doesn't read whole file, io.FileIO does - why?

Tags:

python

io

The following code, executed in python 2.7.2 on windows, only reads in a fraction of the underlying file:

import os

in_file = open(os.path.join(settings.BASEPATH,'CompanyName.docx'))
incontent = in_file.read()
in_file.close()

while this code works just fine:

import io
import os

in_file = io.FileIO(os.path.join(settings.BASEPATH,'CompanyName.docx'))
incontent = in_file.read()
in_file.close()

Why the difference? From my reading of the docs, they should perform identically.

like image 412
Marcin Avatar asked Jan 30 '12 18:01

Marcin


People also ask

How to read a file in Python?

To read a file in Python, we must open the file in reading mode. There are various methods available for this purpose. We can use the read (size) method to read in size number of data. If size parameter is not specified, it reads and returns up to the end of the file.

How to open a file in Python with a file object?

You can do most of the file manipulation using a file object. Before you can read or write a file, you have to open it using Python's built-in open () function. This function creates a file object, which would be utilized to call other support methods associated with it. Here are parameter details −

Why do we need to close a file in Python?

When we are done, it needs to be closed, so that resources that are tied with the file are freed. Hence, in Python, a file operation takes place in the following order. Open a file. Read or write (perform operation) Close the file.

Can I work with files in Python without importing?

When working in Python, you don’t have to worry about importing any specific external libraries to work with files. Python comes with “batteries included” and the file I/O tools and utilties are a built-in part of the core language.


1 Answers

You need to open the file in binary mode, or the read() will stop at the first EOF character it finds. And a docx is a ZIP file which is guaranteed to contain such a character somewhere.

Try

in_file = open(os.path.join(settings.BASEPATH,'CompanyName.docx'), "rb")

FileIO reads raw bytestreams and those are "binary" by default.

like image 104
Tim Pietzcker Avatar answered Sep 20 '22 01:09

Tim Pietzcker