Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Find highest row in a given column

I'm quite new in stackoverflow and quite recently learnt some basic Python. This is the first time I'm using openpyxl. Before I used xlrd and xlsxwriter and I did manage to make some useful programs. But right now I need a .xlsx reader&writer.

There is a File which I need to read and edit with data already stored in the code. Let's suppose the .xlsx has five columns with data: A, B, C, D, E. In column A, I've over 1000 rows with data. On Column D, I've 150 rows with data.

Basically, I want the program to find the last row with data on a given column (say D). Then, write the stored variable data in the next available row (last row + 1) in column D.

The problem is that I can't use ws.get_highest_row() because it returns the row 1000 on column A.

Basically, so far this is all I've got:

data = 'xxx'
from openpyxl import load_workbook
wb = load_workbook('book.xlsx', use_iterators=True)
ws = wb.get_sheet_by_name('Sheet1')
last_row = ws.get_highest_row()

Obviously this doesn't work at all. last_row returns 1000.

like image 312
egodial Avatar asked Jul 03 '15 18:07

egodial


People also ask

How do I find the highest value in a column in python?

The max() method returns a Series with the maximum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the maximum value for each row.

How do you get max rows in pandas?

Find Maximum Element in Pandas DataFrame's Row Finding the max element of each DataFrame row relies on the max() method as well, but we set the axis argument to 1 . The default value for the axis argument is 0. If the axis equals to 0, the max() method will find the max element of each column.

Which pandas row has the highest return value?

To find the maximum value of a column and to return its corresponding row values in Pandas, we can use df. loc[df[col]. idxmax()].

What is Idxmax in python?

The idxmax() method returns a Series with the index of the maximum value for each column. By specifying the column axis ( axis='columns' ), the idxmax() method returns a Series with the index of the maximum value for each row.


1 Answers

The problem is that get_highest_row() itself uses row dimensions instances to define the maximum row in the sheet. RowDimension has no information about the columns - which means we cannot use it to solve your problem and have to approach it differently.

Here is one kind of "ugly" openpyxl-specific option that though would not work if use_iterators=True:

from openpyxl.utils import coordinate_from_string

def get_maximum_row(ws, column):
    return max(coordinate_from_string(cell)[-1]
               for cell in ws._cells if cell.startswith(column))

Usage:

print get_maximum_row(ws, "A")
print get_maximum_row(ws, "B")
print get_maximum_row(ws, "C")
print get_maximum_row(ws, "D")

Aside from this, I would follow the @LondonRob's suggestion to parse the contents with pandas and let it do the job.

like image 159
alecxe Avatar answered Oct 23 '22 10:10

alecxe