I'm trying to decide on the best internal interface to use in my code, specifically around how to handle file contents. Really, the file contents are just binary data, so bytes is sufficient to represent them.
I'm storing files in different remote locations, so have a couple of different classes for reading and writing. I'm trying to figure out the best interface to use for my functions. Originally I was using file paths, but that was suboptimal because it meant that disk was always used (which meant lots of clumsy tempfiles).
There are several areas of the code that have the same requirement, and would directly use whatever was returned from this interface. As a result whatever abstraction I choose will touch a fair bit of code.
What are the various tradeoffs to using BytesIO vs bytes?
def put_file(location, contents_as_bytes):
def put_file(location, contents_as_fp):
def get_file_contents(location):
def get_file_contents(location, fp):
Playing around I've found that using the File-Like interfaces (BytesIO, etc) requires a bit of administration overhead in terms of seek(0)
etc. That raises a questions like:
seek
before you start, or after you've finished?seek
to the start or just operate from the position the file is in?tell()
to maintain the position?shutil.copyfileobj
it doesn't do any seekingOne advantage I've found with using file-like interfaces instead is that it allows for passing in the fp to write into when you're retrieving data. Which seems to give a good deal of flexibility.
def get_file_contents(location, write_into=None):
if not write_into:
write_into = io.BytesIO()
# get the contents and put it into write_into
return write_into
get_file_contents('blah', file_on_disk)
get_file_contents('blah', gzip_file)
get_file_contents('blah', temp_file)
get_file_contents('blah', bytes_io)
new_bytes_io = get_file_contents('blah')
# etc
Is there a good reason to prefer BytesIO over just using fixed bytes when designing an interface in python?
The benefit of io.BytesIO
objects is that they implement a common-ish interface (commonly known as a 'file-like' object). BytesIO
objects have an internal pointer (whose position is returned by tell()
) and for every call to read(n)
the pointer advances n
bytes. Ex.
import io
buf = io.BytesIO(b'Hello world!')
buf.read(1) # Returns b'H'
buf.tell() # Returns 1
buf.read(1) # Returns b'e'
buf.tell() # Returns 2
# Set the pointer to 0.
buf.seek(0)
buf.read() # This will return b'H', like the first call.
In your use case, both the bytes
object and the io.BytesIO
object are maybe not the best solutions. They will read the complete contents of your files into memory.
Instead, you could look at tempfile.TemporaryFile
(https://docs.python.org/3/library/tempfile.html).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With