Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using openpyxl to read file from memory

Tags:

python

excel

I downloaded a google-spreadsheet as an object in python.

How can I use openpyxl use the workbook without having it to save to disk first?

I know that xlrd can do this by:

book = xlrd.open_workbook(file_contents=downloaded_spreadsheet.read()) 

with "downloaded_spreadsheet" being my downloaded xlsx-file as an object.

Instead of xlrd, I want to use openpyxl because of better xlsx-support(I read).

I'm using this so far...

#!/usr/bin/python      import openpyxl     import xlrd     # which to use..?   import re, urllib, urllib2  class Spreadsheet(object):     def __init__(self, key):         super(Spreadsheet, self).__init__()         self.key = key  class Client(object):     def __init__(self, email, password):         super(Client, self).__init__()         self.email = email         self.password = password      def _get_auth_token(self, email, password, source, service):         url = "https://www.google.com/accounts/ClientLogin"         params = {         "Email": email, "Passwd": password,         "service": service,         "accountType": "HOSTED_OR_GOOGLE",         "source": source         }         req = urllib2.Request(url, urllib.urlencode(params))         return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0]      def get_auth_token(self):         source = type(self).__name__         return self._get_auth_token(self.email, self.password, source, service="wise")      def download(self, spreadsheet, gid=0, format="xls"):          url_format = "https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=%s&exportFormat=%s&gid=%i"         headers = {         "Authorization": "GoogleLogin auth=" + self.get_auth_token(),         "GData-Version": "3.0"         }         req = urllib2.Request(url_format % (spreadsheet.key, format, gid), headers=headers)         return urllib2.urlopen(req)  if __name__ == "__main__":        email = "[email protected]" # (your email here)     password = '.....'     spreadsheet_id = "......" # (spreadsheet id here)      # Create client and spreadsheet objects     gs = Client(email, password)     ss = Spreadsheet(spreadsheet_id)      # Request a file-like object containing the spreadsheet's contents     downloaded_spreadsheet = gs.download(ss)       # book = xlrd.open_workbook(file_contents=downloaded_spreadsheet.read(), formatting_info=True)      #It works.. alas xlrd doesn't support the xlsx-funcionality that i want...     #i.e. being able to read the cell-colordata.. 

I hope anyone can help because I'm struggling for months to get the color-data from given cell in google-spreadsheet. (I know the google-api doesn't support it..)

like image 976
Kaspar128 Avatar asked Dec 17 '13 13:12

Kaspar128


People also ask

Is XLRD faster than openpyxl?

Reading Excel file is magnitudes slower using openpyxl compared to xlrd.

Which is better pandas or openpyxl?

Developers describe openpyxl as "A Python library to read/write Excel 2010 xlsx/xlsm files". A Python library to read/write Excel 2010 xlsx/xlsm files. On the other hand, pandas is detailed as "Powerful data structures for data analysis".

Is openpyxl or XlsxWriter better?

If you are working with large files or are particularly concerned about speed then you may find XlsxWriter a better choice than OpenPyXL. XlsxWriter is a Python module that can be used to write text, numbers, formulas and hyperlinks to multiple worksheets in an Excel 2007+ XLSX file.


1 Answers

In the docs for load_workbook it says:

#:param filename: the path to open or a file-like object 

..so it was capable of it all the time. It reads a path or takes a file-like object. I only had to convert my file-like object returned by urlopen, to a bytestream with:

from io import BytesIO wb = load_workbook(filename=BytesIO(input_excel.read())) 

and I can read every piece of data in my Google-spreadsheet.

like image 139
Kaspar128 Avatar answered Sep 21 '22 06:09

Kaspar128