I'm trying to read in an excel file with .xlsx formatting with the csv
module, but I'm not having any luck with it when using an excel file even with my dialect and encoding specified. Below, I show my different attempts and error results with the different encodings I tried. If anyone could point me into the correct coding, syntax or module I could use to read in a .xlsx file in Python, I'd appreciate it.
With the below code, I get the following error: _csv.Error: line contains NULL byte
#!/usr/bin/python
import sys, csv
with open('filelocation.xlsx', "r+", encoding="Latin1") as inputFile:
csvReader = csv.reader(inputFile, dialect='excel')
for row in csvReader:
print(row)
With the below code, I get the following error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 16: invalid continuation byte
#!/usr/bin/python
import sys, csv
with open('filelocation.xlsx', "r+", encoding="Latin1") as inputFile:
csvReader = csv.reader(inputFile, dialect='excel')
for row in csvReader:
print(row)
When I use utf-16
in the encoding
, I get the following error: UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 570-571: illegal UTF-16 surrogate
The csv module implements classes to read and write tabular data in CSV format. It allows programmers to say, “write this data in the format preferred by Excel,” or “read data from this file which was generated by Excel,” without knowing the precise details of the CSV format used by Excel.
The read_excel() function of pandas is used for reading the xlsx file. This function has used in the script to read the sales. xlsx file. The DataFrame() function has used here to read the content of the xlsx file in the data frame and store the values in the variable named data.
You cannot use Python's csv
library for reading xlsx
formatted files. You need to install and use a different library. For example, you could use openpyxl
as follows:
import openpyxl
wb = openpyxl.load_workbook("filelocation.xlsx")
ws = wb.active
for row in ws.iter_rows(values_only=True):
print(row)
This would display all of the rows in the file as lists of row values. The Python Excel website gives other possible examples.
Alternatively you could create a list of rows:
import openpyxl
wb = openpyxl.load_workbook("filelocation.xlsx")
ws = wb.active
data = list(ws.iter_rows(values_only=True))
print(data)
Note: If you are using the older Excel format .xls
, you could instead use the xlrd
library. This no longer supports the .xlsx
format though.
import xlrd
workbook = xlrd.open_workbook("filelocation.xlsx")
sheet = workbook.sheet_by_index(0)
data = [sheet.row_values(rowx) for rowx in range(sheet.nrows)]
print(data)
Here's a very very rough implementation using just the standard library.
def xlsx(fname, sheet=1):
import zipfile
from xml.etree.ElementTree import iterparse
z = zipfile.ZipFile(fname)
strings = [el.text for e, el in iterparse(z.open('xl/sharedStrings.xml')) if el.tag.endswith('}t')]
rows = []
row = {}
value = ''
for e, el in iterparse(z.open('xl/worksheets/sheet%s.xml' % sheet)):
if el.tag.endswith('}v'): # <v>84</v>
value = el.text
if el.tag.endswith('}c'): # <c r="A3" t="s"><v>84</v></c>
if el.attrib.get('t') == 's':
value = strings[int(value)]
column_name = ''.join(x for x in el.attrib['r'] if not x.isdigit()) # AZ22
row[column_name] = value
value = ''
if el.tag.endswith('}row'):
rows.append(row)
row = {}
return rows
(This is copied from a deleted question: https://stackoverflow.com/questions/4371163/reading-xlsx-files-using-python )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With