Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas open_excel() fails with xlrd.biffh.XLRDError: Can't find workbook in OLE2 compound document

I'm trying to use pandas to parse an .xlsm document. My code worked perfectly with the example file I was given, but once I got the rest of the documents, it failed with the above error. Here's the offending stack trace:

Traceback (most recent call last):
  File "@@@@@@@@/UnsupervisedCAM.py", line 9, in <module>
    info_dict = read_excel_to_dict('files/' + filename)
  File "@@@@@@@@\readCAM.py", line 7, in read_excel_to_dict
    df = pandas.read_excel(filename, parse_cols='E,G,I,K,Q,O')
  File "@@@@@@@@\Anaconda3\envs\tensorflow\lib\site-packages\pandas\io\excel.py", line 191, in read_excel
    io = ExcelFile(io, engine=engine)
  File "@@@@@@@@\Anaconda3\envs\tensorflow\lib\site-packages\pandas\io\excel.py", line 249, in __init__
    self.book = xlrd.open_workbook(io)
  File "@@@@@@@@\Anaconda3\envs\tensorflow\lib\site-packages\xlrd\__init__.py", line 441, in open_workbook
    ragged_rows=ragged_rows,
  File "@@@@@@@@\Anaconda3\envs\tensorflow\lib\site-packages\xlrd\book.py", line 87, in open_workbook_xls
    ragged_rows=ragged_rows,
  File "@@@@@@@@\Anaconda3\envs\tensorflow\lib\site-packages\xlrd\book.py", line 595, in biff2_8_load
    raise XLRDError("Can't find workbook in OLE2 compound document")
xlrd.biffh.XLRDError: Can't find workbook in OLE2 compound document

I'm not even sure where to start... Haven't found anything of use online.

like image 670
bendl Avatar asked Jun 12 '17 14:06

bendl


People also ask

Why Pandas Cannot read excel file?

Pandas uses the xlrd as their default engine for reading excel files. However, xlrd has removed support for anything other than xls files in their latest release. This causes you to receive the error that the xlsx filetype is no longer supported when calling the read_excel function on a xlsx excel using pandas.

How do I open a .xlsx file with pandas?

pandas. read_excel() function is used to read excel sheet with extension xlsx into pandas DataFrame. By reading a single sheet it returns a pandas DataFrame object, but reading two sheets it returns a Dict of DataFrame. Can load excel files stored in a local filesystem or from an URL.

Does pandas support xlsx?

Read an Excel file into a pandas DataFrame. Supports xls , xlsx , xlsm , xlsb , odf , ods and odt file extensions read from a local filesystem or URL.


2 Answers

I got the same error message and could solve it by removing the password protection of the xlsx-file. (not saying that it's the only reason for the error, but worth checking!)

like image 113
ivegotaquestion Avatar answered Oct 24 '22 23:10

ivegotaquestion


After a lot of searching, the only way I've found to do this is to open and save all the excel documents, which seems to 'strip' them of their OLE2 format. I automated the process with the following vbs script:

Dim objFSO, objFolder, objFile
Dim objExcel, objWB
Set objExcel = CreateObject("Excel.Application")
Set objFSO = CreateObject("scripting.filesystemobject")
   MyFolder = "<PATH/TO/FILES"
Set objFolder = objfso.getfolder(myfolder)
For Each objFile In objfolder.Files
If Right(objFile.Name,4) = "<EXTENSION>" Then
Set objWB = objExcel.Workbooks.Open(objFile)
objWB.save
objWB.close
End If
Next
objExcel.Quit
Set objExcel = Nothing
Set objFSO = Nothing
Wscript.Echo "Done"

Make sure to change the path to the folder and extension.

like image 20
bendl Avatar answered Oct 24 '22 22:10

bendl