What are the Python3 options to efficiently (performance and memory) extract sheet names and for a given sheet, and also column names from a very large .xlsx file?
I've tried using pandas:
For sheet names using pd.ExcelFile
:
xl = pd.ExcelFile(filename)
return xl.sheet_names
For column names using pd.ExcelFile
:
xl = pd.ExcelFile(filename)
df = xl.parse(sheetname, nrows=2, **kwargs)
df.columns
For column names using pd.read_excel
with and without nrows
(>v23):
df = pd.read_excel(io=filename, sheet_name=sheetname, nrows=2)
df.columns
However, both pd.ExcelFile
and and pd.read_excel
seem to read the entire .xlsx in memory and are therefore slow.
Thanks a lot!
Here is the easiest way I can share with you:
# read the sheet file
import pandas as pd
my_sheets = pd.ExcelFile('sheet_filename.xlsx')
my_sheets.sheet_names
According to this SO question, reading excel files in chunks is not supported (see this issue on github), and using nrows
will always read all the file into memory first.
Possible solutions:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With