Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I read an Excel file directly from Dropbox's API using pandas.read_excel()?

I'm interested in comparing two versions of smallish Excel files stored in Dropbox as separate version.

Using the Python SDK, specifically the files_download() method, I'm getting a requests.models.Response object, but I'm having trouble getting pandas.read_excel() to consume it.

Here's the code snippet:

with open(resp.content, "rb") as handle:
    df = pandas.read_excel(handle.read())

The error:

TypeError('file() argument 1 must be encoded string without null bytes, not str',)

I know I'm missing something fundamental, possibly needing to encode the file as a binary. (Tried base64.b64encode, and some other things, with no success yet.) I'm hoping someone can help me with a point in the right direction, possibly with the io module?

I'm using Python 2.7.15

For the avoidance of doubt, I'm specifically looking to avoid the step of first saving the Excel files to the filesystem. I'm sure I can accomplish the broader objective that way, but to optimize I'm trying to read the files from Dropbox directly into pandas DataFrames, and the fact that the read_excel() method takes a file-like object means—I think—that I should be able to do that.

Basically, I think this sums up the pain I'm experiencing at the moment. I need to get the response from Dropbox into the form of a file-like object.

like image 445
HaPsantran Avatar asked Jan 31 '26 23:01

HaPsantran


1 Answers

The following code will do what you want.

# Imports and initialization of variables
from contextlib import closing # this will correctly close the request
import io
import dropbox
token = "YOURTOKEN" #get token on https://www.dropbox.com/developers/apps/
dbx = dropbox.Dropbox(token)
yourpath = "somefile.xlsx" # This approach is not limited to excel files

# Relevant streamer
def stream_dropbox_file(path):
    _,res=dbx.files_download(path)
    with closing(res) as result:
        byte_data=result.content
        return io.BytesIO(byte_data)

# Usage
file_stream=stream_dropbox_file(yourpath)
pd.read_excel(file_stream)

The nice part of this approach is that using io.BytesIO converts the data into a general file-like object. Thus you can also use this to read things like csv's with pd.read_csv().

The code should also work for non-pandas io methods, such as loading images, but I haven't tested that explicitly.

like image 181
Ivo Merchiers Avatar answered Feb 03 '26 12:02

Ivo Merchiers