The Python library pandas can read Excel spreadsheets and convert them to a pandas.DataFrame
with pandas.read_excel(file)
command. Under the hood, it uses xlrd library which does not support ods files.
Is there an equivalent of pandas.read_excel
for ods files? If not, how can I do the same for an Open Document Formatted spreadsheet (ods file)? ODF is used by LibreOffice and OpenOffice.
You can read ODF (Open Document Format . ods ) documents in Python using the following modules: odfpy / read-ods-with-odfpy. ezodf.
This is available natively in pandas 0.25. So long as you have odfpy installed (conda install odfpy OR pip install odfpy) you can do
pd.read_excel("the_document.ods", engine="odf")
You can read ODF (Open Document Format .ods
) documents in Python using the following modules:
Using ezodf, a simple ODS-to-DataFrame converter could look like this:
import pandas as pd
import ezodf
doc = ezodf.opendoc('some_odf_spreadsheet.ods')
print("Spreadsheet contains %d sheet(s)." % len(doc.sheets))
for sheet in doc.sheets:
print("-"*40)
print(" Sheet name : '%s'" % sheet.name)
print("Size of Sheet : (rows=%d, cols=%d)" % (sheet.nrows(), sheet.ncols()) )
# convert the first sheet to a pandas.DataFrame
sheet = doc.sheets[0]
df_dict = {}
for i, row in enumerate(sheet.rows()):
# row is a list of cells
# assume the header is on the first row
if i == 0:
# columns as lists in a dictionary
df_dict = {cell.value:[] for cell in row}
# create index for the column headers
col_index = {j:cell.value for j, cell in enumerate(row)}
continue
for j, cell in enumerate(row):
# use header instead of column index
df_dict[col_index[j]].append(cell.value)
# and convert to a DataFrame
df = pd.DataFrame(df_dict)
P.S.
ODF spreadsheet (*.ods files) support has been requested on the pandas
issue tracker: https://github.com/pydata/pandas/issues/2311, but it is still not implemented.
ezodf
was used in the unfinished PR9070 to implement ODF support in pandas. That PR is now closed (read the PR for a technical discussion), but it is still available as an experimental feature in this pandas
fork.
Here is a quick and dirty hack which uses ezodf module:
import pandas as pd
import ezodf
def read_ods(filename, sheet_no=0, header=0):
tab = ezodf.opendoc(filename=filename).sheets[sheet_no]
return pd.DataFrame({col[header].value:[x.value for x in col[header+1:]]
for col in tab.columns()})
Test:
In [92]: df = read_ods(filename='fn.ods')
In [93]: df
Out[93]:
a b c
0 1.0 2.0 3.0
1 4.0 5.0 6.0
2 7.0 8.0 9.0
NOTES:
header
, skiprows
, index_col
, parse_cols
are NOT implemented in this function - feel free to update this question if you want to implement themezodf
depends on lxml
make sure you have it installedIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With