Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to open excel file fast in Python?

Tags:

python

excel

I am now using PyExcelerator for reading excel files, but it is extremely slow. As I always need to open excel files more than 100MB, it takes me more than twenty minutes to only load one file.

The functionality I need are:

  • Open Excel Files, Select Specific Tables, And Load Them Into a Dict or List object.
  • Sometimes: Select Specific Columns And Only Load Whole Lines Which Have the Specific Columns in Specific Values.
  • Read Excel Files With Password Protected.

And the code I am using now is:

book = pyExcelerator.parse_xls(filepath)
parsed_dictionary = defaultdict(lambda: '', book[0][1])
number_of_columns = 44
result_list = []
number_of_rows = 500000
for i in range(0, number_of_rows):
    ok = False
    result_list.append([])
    for h in range(0, number_of_columns):
        item = parsed_dictionary[i,h]
        if type(item) is StringType or type(item) is UnicodeType:
            item = item.replace("\t","").strip()
        result_list[i].append(item)
        if item != '':
            ok = True
    if not ok:
        break

Any suggestions?

like image 540
Felix Yan Avatar asked May 03 '11 04:05

Felix Yan


People also ask

How to read an Excel file using Python?

Reading an excel file using Python. Using xlrd module, one can retrieve information from a spreadsheet. For example, reading, writing or modifying the data can be done in Python. Also, user might have to go through various sheets and retrieve data based on some criteria or modify some rows and columns and do a lot of work. xlrd module is used...

How long does it take to import Excel into Python?

A simple way to import Excel files in Python. It takes around 50 seconds to run. Pretty slow. Let’s now imagine that we saved these files as .csv (rather than .xlsx) from our ERP/System/SAP. Importing csv files in Python is 100x faster than Excel files. We can now load these files in 0.63 seconds. That’s nearly 10 times faster!

Why use Python for handling large Excel files?

This workbooks are slow and tend to crash after some calculation. So I started to use python for handling large excel files which offer another big advantage: You create code which is reproducible and provide documentation as well. Let´s jump in! The file we want to process contains nearly 1 million rows and 16 columns:

What is it about it that makes Excel so fast?

It essentially tries to "stream" Excel files like Python does with normal files and is therefore very fast when you only need a subset of data (esp. if it is near the beginning of the file). Thanks for contributing an answer to Stack Overflow!


2 Answers

pyExcelerator appears not to be maintained. To write xls files, use xlwt, which is a fork of pyExcelerator with bug fixes and many enhancements. The (very basic) xls reading capability of pyExcelerator was eradicated from xlwt. To read xls files, use xlrd.

If it's taking 20 minutes to load a 100MB xls file, you must be using one or more of: a slow computer, a computer with very little available memory, or an older version of Python.

Neither pyExcelerator nor xlrd read password-protected files.

Here's a link that covers xlrd and xlwt.

Disclaimer: I'm the author of xlrd and maintainer of xlwt.

like image 195
John Machin Avatar answered Sep 20 '22 03:09

John Machin


xlrd is pretty good for reading files and xlwt is pretty good for writing. Both superior to pyExcelerator in my experience.

like image 21
spulec Avatar answered Sep 21 '22 03:09

spulec