Read in .xlsx with csv module in python

Tags:

I'm trying to read in an excel file with .xlsx formatting with the csv module, but I'm not having any luck with it when using an excel file even with my dialect and encoding specified. Below, I show my different attempts and error results with the different encodings I tried. If anyone could point me into the correct coding, syntax or module I could use to read in a .xlsx file in Python, I'd appreciate it.

With the below code, I get the following error: _csv.Error: line contains NULL byte

Click to copy

#!/usr/bin/python

import sys, csv

with open('filelocation.xlsx', "r+", encoding="Latin1")  as inputFile:
    csvReader = csv.reader(inputFile, dialect='excel')
    for row in csvReader:
        print(row)

With the below code, I get the following error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 16: invalid continuation byte

Click to copy

#!/usr/bin/python

import sys, csv

with open('filelocation.xlsx', "r+", encoding="Latin1")  as inputFile:
    csvReader = csv.reader(inputFile, dialect='excel')
    for row in csvReader:
        print(row)

When I use utf-16 in the encoding, I get the following error: UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 570-571: illegal UTF-16 surrogate

727

asked Mar 02 '16 10:03

pHorseSpec

2 Answers

You cannot use Python's csv library for reading xlsx formatted files. You need to install and use a different library. For example, you could use openpyxl as follows:

Click to copy

import openpyxl

wb = openpyxl.load_workbook("filelocation.xlsx")
ws = wb.active

for row in ws.iter_rows(values_only=True):
    print(row)

This would display all of the rows in the file as lists of row values. The Python Excel website gives other possible examples.

Alternatively you could create a list of rows:

Click to copy

import openpyxl

wb = openpyxl.load_workbook("filelocation.xlsx")
ws = wb.active

data = list(ws.iter_rows(values_only=True))

print(data)

Note: If you are using the older Excel format .xls, you could instead use the xlrd library. This no longer supports the .xlsx format though.

Click to copy

import xlrd

workbook = xlrd.open_workbook("filelocation.xlsx")
sheet = workbook.sheet_by_index(0)
data = [sheet.row_values(rowx) for rowx in range(sheet.nrows)]

print(data)

answered Oct 14 '22 03:10

Martin Evans

Here's a very very rough implementation using just the standard library.

Click to copy

def xlsx(fname, sheet=1):
    import zipfile
    from xml.etree.ElementTree import iterparse
    z = zipfile.ZipFile(fname)
    strings = [el.text for e, el in iterparse(z.open('xl/sharedStrings.xml')) if el.tag.endswith('}t')]
    rows = []
    row = {}
    value = ''
    for e, el in iterparse(z.open('xl/worksheets/sheet%s.xml' % sheet)):
        if el.tag.endswith('}v'):  # <v>84</v>
            value = el.text
        if el.tag.endswith('}c'):  # <c r="A3" t="s"><v>84</v></c>
            if el.attrib.get('t') == 's':
                value = strings[int(value)]
            column_name = ''.join(x for x in el.attrib['r'] if not x.isdigit())  # AZ22
            row[column_name] = value
            value = ''
        if el.tag.endswith('}row'):
            rows.append(row)
            row = {}
    return rows

(This is copied from a deleted question: https://stackoverflow.com/questions/4371163/reading-xlsx-files-using-python )

answered Oct 14 '22 03:10

Collin Anderson

Related questions
                            
                                Python SQLite how to get SQL string statement being executed
                            
                                How to find spans with a specific class containing specific text using beautiful soup and re?
                            
                                Rotate pandas DataFrame 90 degrees
                            
                                numpy loadtxt skip first row
                            
                                Python Peewee execute_sql() example
                            
                                Generating random correlated x and y points using Numpy
                            
                                python command line arguments in main, skip script name
                            
                                setuptools and pip: choice of minimal and complete install
                            
                                SQLAlchemy engine absolute path URL in windows
                            
                                Dynamically defining instance fields in Python classes
                            
                                Running an async background task in Tornado
                            
                                How to tell Spyder's style analysis PEP8 to read from a setup.cfg or increase max. line length?
                            
                                Additional Serializer Fields in Django REST Framework 3
                            
                                Why does a space affect the identity comparison of equal strings? [duplicate]
                            
                                Why does a Flask app create two process? [duplicate]
                            
                                Django: How to automatically change a field's value at the time mentioned in the same object?
                            
                                Get a list of values of one column from the results of a query
                            
                                Pip Install hangs
                            
                                Removing elements from pandas series in python
                            
                                PyCrypto for Python3 in Alpine?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Read in .xlsx with csv module in python

Tags:

python

excel

encoding

utf-8

pHorseSpec

People also ask

2 Answers

Martin Evans

Collin Anderson

Recent Activity

Donate For Us