I'm working on an application that processes huge Excel 2007 files, and I'm using OpenPyXL to do it. OpenPyXL has two different methods of reading an Excel file - one "normal" method where the entire document is loaded into memory at once, and one method where iterators are used to read row-by-row.
The problem is that when I'm using the iterator method, I don't get any document meta-data like column widths and row/column count, and i really need this data. I assume this data is stored in the Excel document close to the top, so it shouldn't be necessary to load the whole 10MB file into memory to get access to it.
So, is there a way to get ahold of the row/column count and column widths without loading the entire document into memory first?
After this we should import openpyxl in our code and then we should be ready to interact with excel. To get the count of the occupied rows in a worksheet, first of all we need to load the entire workbook by specifying the path where it is located. This is achieved with load_workbook() method.
To find the max row and column number from your Excel sheet in Python, use sheet. max_row and sheet.
Adding on to what Hubro said, apparently get_highest_row()
has been deprecated. Using the max_row
and max_column
properties returns the row and column count. For example:
wb = load_workbook(path, use_iterators=True) sheet = wb.worksheets[0] row_count = sheet.max_row column_count = sheet.max_column
Taking a look at the source code of OpenPyXL (IterableWorksheet) I've figured out how to get the column and row count from an iterator worksheet:
wb = load_workbook(path, use_iterators=True) sheet = wb.worksheets[0] row_count = sheet.get_highest_row() - 1 column_count = letter_to_index(sheet.get_highest_column()) + 1
IterableWorksheet.get_highest_column
returns a string with the column letter that you can see in Excel, e.g. "A", "B", "C" etc. Therefore I've also written a function to translate the column letter to a zero based index:
def letter_to_index(letter): """Converts a column letter, e.g. "A", "B", "AA", "BC" etc. to a zero based column index. A becomes 0, B becomes 1, Z becomes 25, AA becomes 26 etc. Args: letter (str): The column index letter. Returns: The column index as an integer. """ letter = letter.upper() result = 0 for index, char in enumerate(reversed(letter)): # Get the ASCII number of the letter and subtract 64 so that A # corresponds to 1. num = ord(char) - 64 # Multiply the number with 26 to the power of `index` to get the correct # value of the letter based on it's index in the string. final_num = (26 ** index) * num result += final_num # Subtract 1 from the result to make it zero-based before returning. return result - 1
I still haven't figured out how to get the column sizes though, so I've decided to use a fixed-width font and automatically scaled columns in my application.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With