This is my code. I've just installed jupyterlab and i've added the excel file in there. Same error if i change the path to where the file is on my system. I can't seem to find anyone who had the same problem when simply importing an excel file as a dataframe.
The excel file is a 3x26 table with studentnr, course, result columns that have values like 101-105, A-D, 1.0-9.9 respectively. Maybe the problem lies with the excel file?
Either way i have no idea how to fix this.
import pandas as pd
import numpy as np
df = pd.read_excel('student-results.xlsx')
This is the error I'm getting:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-6-9d38e4d56bbe> in <module>
1 import pandas as pd
2 import numpy as np
----> 3 df = pd.read_excel('student-results.xlsx')
c:\python\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
294 )
295 warnings.warn(msg, FutureWarning, stacklevel=stacklevel)
--> 296 return func(*args, **kwargs)
297
298 return wrapper
c:\python\lib\site-packages\pandas\io\excel\_base.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols)
302
303 if not isinstance(io, ExcelFile):
--> 304 io = ExcelFile(io, engine=engine)
305 elif engine and engine != io.engine:
306 raise ValueError(
c:\python\lib\site-packages\pandas\io\excel\_base.py in __init__(self, path_or_buffer, engine)
865 self._io = stringify_path(path_or_buffer)
866
--> 867 self._reader = self._engines[engine](self._io)
868
869 def __fspath__(self):
c:\python\lib\site-packages\pandas\io\excel\_xlrd.py in __init__(self, filepath_or_buffer)
20 err_msg = "Install xlrd >= 1.0.0 for Excel support"
21 import_optional_dependency("xlrd", extra=err_msg)
---> 22 super().__init__(filepath_or_buffer)
23
24 @property
c:\python\lib\site-packages\pandas\io\excel\_base.py in __init__(self, filepath_or_buffer)
351 self.book = self.load_workbook(filepath_or_buffer)
352 elif isinstance(filepath_or_buffer, str):
--> 353 self.book = self.load_workbook(filepath_or_buffer)
354 elif isinstance(filepath_or_buffer, bytes):
355 self.book = self.load_workbook(BytesIO(filepath_or_buffer))
c:\python\lib\site-packages\pandas\io\excel\_xlrd.py in load_workbook(self, filepath_or_buffer)
35 return open_workbook(file_contents=data)
36 else:
---> 37 return open_workbook(filepath_or_buffer)
38
39 @property
c:\python\lib\site-packages\xlrd\__init__.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
128 if 'xl/workbook.xml' in component_names:
129 from . import xlsx
--> 130 bk = xlsx.open_workbook_2007_xml(
131 zf,
132 component_names,
c:\python\lib\site-packages\xlrd\xlsx.py in open_workbook_2007_xml(zf, component_names, logfile, verbosity, use_mmap, formatting_info, on_demand, ragged_rows)
810 del zflo
811 zflo = zf.open(component_names['xl/workbook.xml'])
--> 812 x12book.process_stream(zflo, 'Workbook')
813 del zflo
814 props_name = 'docprops/core.xml'
c:\python\lib\site-packages\xlrd\xlsx.py in process_stream(self, stream, heading)
264 self.tree = ET.parse(stream)
265 getmethod = self.tag2meth.get
--> 266 for elem in self.tree.iter() if Element_has_iter else self.tree.getiterator():
267 if self.verbosity >= 3:
268 self.dump_elem(elem)
AttributeError: 'ElementTree' object has no attribute 'getiterator'
You could try to use an argument engine="openpyxl"
. It helped me to resolve the same problem.
The error occurs when pandas is used in python3.9+ because the code xml.etree.ElementTree.Element.getiterator()
which had been deprecated with a warning previously, has now been removed.
A workaround is to install another engine openpyxl to read the excel file, and replace your code which reads the excel file.
First,
pip3 install openpyxl
Then, instead of pd.read_excel('student-results.xlsx')
, write pd.read_excel('student-results.xlsx', engine='openpyxl')
Reference: Python bug tracker
I got the same error with xlrd (1.2.0) or xlrd3 (1.0.0) without pandas, but with Python 3.9. The following may interest those looking for an explanation:
It only happened when defusedxml was available (in that case, xlrd will use it). But it could be worked around, without changing any of the involved libraries:
import xlrd
xlrd.xlsx.ensure_elementtree_imported(False, None)
xlrd.xlsx.Element_has_iter = True
The second line ensures that Element_has_iter
will not be reset when opening a workbook, so that it remains to True - as set in the 3rd line. When this is done, xlrd uses iter
instead of crashing on the missing getiterator
.
That said, I agree that moving to openpyxl in place of xlrd is a cleaner solution, at least untill xlrd or xlrd3 possibly gets fixed. Openpyxl appears to be more actively developed. In my case, I have to adapt direct calls to those libraries, it is probably more work than just typing openpyxl instead of xlrd to tell pandas about what it should do, but I'll consider it.
So ok with @corridda, use openpyxl, and others are right about the cause, but maybe this explains a little more on the causes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With