Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Openpyxl max_row and max_column wrongly reports a larger figure

My query is to do with a function that is part of a parsing script Im developing. I am trying to write a python function to find the column number corresponding to a matched value in excel. The excel has been created on the fly with openpyxl, and it has the first row (from 3rd column) headers that each span 4 columns merged into one. In my subsequent function, I am parsing some content to be added to the columns corresponding to the matching headers. (Additional info: The content I'm parsing is blast+ output. I'm trying to create a summary spreadsheet with the hit names in each column with subcolumns for hits, gaps, span and identity. The first two columns are query contigs and its length. )

I had initially written a similar function for xlrd and it worked. But when I try to rewrite it for openpyxl, I find that the max_row and max_col function wrongly returns a larger number of rows and columns than actually present. For instance, I have 20 rows for this pilot input, but it reports it as 82. Note that I manually selected the empty rows & columns and right clicked and deleted them, as advised elsewhere in this forum. This didn't change the error.

def find_column_number(x):
    col = 0
    print "maxrow = ", hrsh.max_row
    print "maxcol = ", hrsh.max_column
    for rowz in range(hrsh.max_row):
        print "now the row is ", rowz
        if(rowz > 0): 
            pass
        for colz in range(hrsh.max_column):
            print "now the column is ", colz
            name = (hrsh.cell(row=rowz,column=colz).value)
            if(name == x):
                col = colz
    return col 

The issue with max_row and max_col, has been discussed here https://bitbucket.org/openpyxl/openpyxl/issues/514/cell-max_row-reports-higher-than-actual I applied the suggestion here. But the max_row is still wrong.

for row in reversed(hrsh.rows):
    values = [cell.value for cell in row]
    if any(values):
        print("last row with data is {0}".format(row[0].row))
        maxrow = row[0].row

I then tried the suggestion at https://www.reddit.com/r/learnpython/comments/3prmun/openpyxl_loop_through_and_find_value_of_the/, and tried to get the column values. Once, again the script takes into account the empty columns and reports a higher number columns than actually present.

for currentRow in hrsh.rows:
    for currentCell in currentRow:
        print(currentCell.value)

Can you please help me resolve this error, or suggest another method to achieve my aim?

like image 741
Brindha Lekshmisaran Avatar asked Oct 04 '17 16:10

Brindha Lekshmisaran


1 Answers

As noted in the bug report you linked to there's a difference between a sheet's reported dimensions and whether these include empty rows or columns. If max_row and max_column are not reporting what you want to see then you will need to write your own code to find the first completely empty. The most efficient way, of course, would be to start from max_row and work backwards but the following is probably sufficient:

for max_row, row in enumerate(ws, 1):
    if all(c.value is None for c in row):
        break
like image 81
Charlie Clark Avatar answered Nov 14 '22 22:11

Charlie Clark