The following code, executed in python 2.7.2 on windows, only reads in a fraction of the underlying file:
import os
in_file = open(os.path.join(settings.BASEPATH,'CompanyName.docx'))
incontent = in_file.read()
in_file.close()
while this code works just fine:
import io
import os
in_file = io.FileIO(os.path.join(settings.BASEPATH,'CompanyName.docx'))
incontent = in_file.read()
in_file.close()
Why the difference? From my reading of the docs, they should perform identically.
To read a file in Python, we must open the file in reading mode. There are various methods available for this purpose. We can use the read (size) method to read in size number of data. If size parameter is not specified, it reads and returns up to the end of the file.
You can do most of the file manipulation using a file object. Before you can read or write a file, you have to open it using Python's built-in open () function. This function creates a file object, which would be utilized to call other support methods associated with it. Here are parameter details −
When we are done, it needs to be closed, so that resources that are tied with the file are freed. Hence, in Python, a file operation takes place in the following order. Open a file. Read or write (perform operation) Close the file.
When working in Python, you don’t have to worry about importing any specific external libraries to work with files. Python comes with “batteries included” and the file I/O tools and utilties are a built-in part of the core language.
You need to open the file in binary mode, or the read()
will stop at the first EOF character it finds. And a docx
is a ZIP file which is guaranteed to contain such a character somewhere.
Try
in_file = open(os.path.join(settings.BASEPATH,'CompanyName.docx'), "rb")
FileIO
reads raw bytestreams and those are "binary" by default.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With