Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AttributeError: 'ElementTree' object has no attribute 'getiterator' when trying to import excel file

This is my code. I've just installed jupyterlab and i've added the excel file in there. Same error if i change the path to where the file is on my system. I can't seem to find anyone who had the same problem when simply importing an excel file as a dataframe.

The excel file is a 3x26 table with studentnr, course, result columns that have values like 101-105, A-D, 1.0-9.9 respectively. Maybe the problem lies with the excel file?

Either way i have no idea how to fix this.

import pandas as pd
import numpy as np
df = pd.read_excel('student-results.xlsx')

This is the error I'm getting:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-9d38e4d56bbe> in <module>
      1 import pandas as pd
      2 import numpy as np
----> 3 df = pd.read_excel('student-results.xlsx')

c:\python\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    294                 )
    295                 warnings.warn(msg, FutureWarning, stacklevel=stacklevel)
--> 296             return func(*args, **kwargs)
    297 
    298         return wrapper

c:\python\lib\site-packages\pandas\io\excel\_base.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols)
    302 
    303     if not isinstance(io, ExcelFile):
--> 304         io = ExcelFile(io, engine=engine)
    305     elif engine and engine != io.engine:
    306         raise ValueError(

c:\python\lib\site-packages\pandas\io\excel\_base.py in __init__(self, path_or_buffer, engine)
    865         self._io = stringify_path(path_or_buffer)
    866 
--> 867         self._reader = self._engines[engine](self._io)
    868 
    869     def __fspath__(self):

c:\python\lib\site-packages\pandas\io\excel\_xlrd.py in __init__(self, filepath_or_buffer)
     20         err_msg = "Install xlrd >= 1.0.0 for Excel support"
     21         import_optional_dependency("xlrd", extra=err_msg)
---> 22         super().__init__(filepath_or_buffer)
     23 
     24     @property

c:\python\lib\site-packages\pandas\io\excel\_base.py in __init__(self, filepath_or_buffer)
    351             self.book = self.load_workbook(filepath_or_buffer)
    352         elif isinstance(filepath_or_buffer, str):
--> 353             self.book = self.load_workbook(filepath_or_buffer)
    354         elif isinstance(filepath_or_buffer, bytes):
    355             self.book = self.load_workbook(BytesIO(filepath_or_buffer))

c:\python\lib\site-packages\pandas\io\excel\_xlrd.py in load_workbook(self, filepath_or_buffer)
     35             return open_workbook(file_contents=data)
     36         else:
---> 37             return open_workbook(filepath_or_buffer)
     38 
     39     @property

c:\python\lib\site-packages\xlrd\__init__.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
    128         if 'xl/workbook.xml' in component_names:
    129             from . import xlsx
--> 130             bk = xlsx.open_workbook_2007_xml(
    131                 zf,
    132                 component_names,

c:\python\lib\site-packages\xlrd\xlsx.py in open_workbook_2007_xml(zf, component_names, logfile, verbosity, use_mmap, formatting_info, on_demand, ragged_rows)
    810     del zflo
    811     zflo = zf.open(component_names['xl/workbook.xml'])
--> 812     x12book.process_stream(zflo, 'Workbook')
    813     del zflo
    814     props_name = 'docprops/core.xml'

c:\python\lib\site-packages\xlrd\xlsx.py in process_stream(self, stream, heading)
    264         self.tree = ET.parse(stream)
    265         getmethod = self.tag2meth.get
--> 266         for elem in self.tree.iter() if Element_has_iter else self.tree.getiterator():
    267             if self.verbosity >= 3:
    268                 self.dump_elem(elem)

AttributeError: 'ElementTree' object has no attribute 'getiterator'
like image 603
Ziggy Avatar asked Oct 08 '20 14:10

Ziggy


3 Answers

You could try to use an argument engine="openpyxl". It helped me to resolve the same problem.

like image 83
corridda Avatar answered Oct 18 '22 07:10

corridda


The error occurs when pandas is used in python3.9+ because the code xml.etree.ElementTree.Element.getiterator() which had been deprecated with a warning previously, has now been removed.

A workaround is to install another engine openpyxl to read the excel file, and replace your code which reads the excel file.

First,

pip3 install openpyxl

Then, instead of pd.read_excel('student-results.xlsx'), write pd.read_excel('student-results.xlsx', engine='openpyxl')

Reference: Python bug tracker

like image 20
Joel G Mathew Avatar answered Oct 18 '22 06:10

Joel G Mathew


I got the same error with xlrd (1.2.0) or xlrd3 (1.0.0) without pandas, but with Python 3.9. The following may interest those looking for an explanation:

It only happened when defusedxml was available (in that case, xlrd will use it). But it could be worked around, without changing any of the involved libraries:

import xlrd
xlrd.xlsx.ensure_elementtree_imported(False, None)
xlrd.xlsx.Element_has_iter = True

The second line ensures that Element_has_iter will not be reset when opening a workbook, so that it remains to True - as set in the 3rd line. When this is done, xlrd uses iter instead of crashing on the missing getiterator.

That said, I agree that moving to openpyxl in place of xlrd is a cleaner solution, at least untill xlrd or xlrd3 possibly gets fixed. Openpyxl appears to be more actively developed. In my case, I have to adapt direct calls to those libraries, it is probably more work than just typing openpyxl instead of xlrd to tell pandas about what it should do, but I'll consider it.

So ok with @corridda, use openpyxl, and others are right about the cause, but maybe this explains a little more on the causes.

like image 19
PhiM Avatar answered Oct 18 '22 07:10

PhiM