Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to highlight rows based on content in Excel Dataframe?

I have an excel file containing data like the one to left, and I'm trying to format it to get the data formatting as table to the right.

enter image description here

With my current code I'm able to format all rows containig headers (H1, H2,...)

This is the content of file.xlsx:

This is my current code:

import pandas as pd
import numpy as np

from xlsxwriter.utility import xl_rowcol_to_cell

data = {'H1': {0: 'A', 1: '', 2: 'H1', 3: 'A', 4: '', 5: 'H1', 6: 'A', 7: 'A', 8: 'B', 9: 'B', 10: 'B', 11: '', 12: 'H1', 13: 'B', 14: 'B', 15: '', 16: 'H1', 17: 'C', 18: 'C', 19: 'C', 20: 'D', 21: 'D', 22: ''}, 'H2': {0: 'Rty', 1: '', 2: 'H2', 3: 'Rty', 4: '', 5: 'H2', 6: 'Rty', 7: 'Rty', 8: 'Rty', 9: 'Rty', 10: 'Rty', 11: '', 12: 'H2', 13: 'Rty', 14: 'Rty', 15: '', 16: 'H2', 17: 'Rty', 18: 'Rty', 19: 'Rty', 20: 'Rty', 21: 'Rty', 22: ''}, 'H3': {0: '1195', 1: '', 2: 'H3', 3: '1195', 4: '', 5: 'H3', 6: '1195', 7: '1195', 8: '1195', 9: '1195', 10: '1195', 11: '', 12: 'H3', 13: '1195', 14: '1195', 15: '', 16: 'H3', 17: '1195', 18: '1195', 19: '1195', 20: '1195', 21: '1195', 22: ''}, 'H4': {0: '9038', 1: 'H3=9038, 000', 2: 'H4', 3: '1355', 4: 'H3=1355, 363', 5: 'H4', 6: '2022', 7: '2022', 8: '2022', 9: '2022', 10: '2022', 11: 'H3=2022, 234', 12: 'H4', 13: '2564', 14: '2564', 15: 'H3=2564, 726', 16: 'H4', 17: '1501', 18: '1501', 19: '1501', 20: '1501', 21: '1501', 22: 'H3=1501, 143'}, 'H5': {0: '1537', 1: '', 2: 'H5', 3: '8', 4: '', 5: 'H5', 6: '59', 7: '78', 8: '76', 9: '6', 10: '31', 11: '', 12: 'H5', 13: '71', 14: '17', 15: '', 16: 'H5', 17: '72', 18: '89', 19: '47', 20: '32', 21: '233', 22: ''}}
df = pd.DataFrame.from_dict(data)


writer = pd.ExcelWriter('Output.xlsx', engine='xlsxwriter')
df.to_excel(writer, index=False, sheet_name='Output')

workbook = writer.book
worksheet = writer.sheets['Output']
number_rows = len(df.index)

format1 = workbook.add_format({'bg_color': 'black', 'font_color': 'yellow'})

for r in range(0,number_rows):
    if df.iat[r,0] == "H1":
        worksheet.set_row(r+1, None, format1) 

writer.save()

This is my current output:

enter image description here

I'm stuck in how to limit the format from column A to E and in how to intarcalate color green, yellow, green yello depending when values in column A changes. I mean, for all consecutive values in column A = "A" highlight in green, when changes highlight to yellow when changes highlightto green again and so on.

How can I do this? Thanks in advance.

like image 872
Ger Cas Avatar asked Jul 21 '20 08:07

Ger Cas


2 Answers

You can do it with different excel library, like openpyxl

You can format each cell separately For example:

from openpyxl import Workbook
from openpyxl.styles import Font, Color, colors, fills
from openpyxl.utils.dataframe import dataframe_to_rows
wb = Workbook()
ws = wb.active

for r in dataframe_to_rows(df, index=False, header=True):
    ws.append(r)

a1 = ws['A1']
a1.font = Font(color="FF0000")
a1.fill = fills.PatternFill(patternType='solid', fgColor=Color(rgb='00FF00'))
wb.save("pandas_openpyxl.xlsx")

They have great documentation here: https://openpyxl.readthedocs.io/en/stable/pandas.html

like image 159
Elkana Bardugo Avatar answered Oct 11 '22 21:10

Elkana Bardugo


If you want to continue using the xlsxwriter Python module you may use the write(…) method on the worksheet object to set the contents and formatting of a cell in one step.

You will have to break up your to_excel() method and write each DataFrame value individually in a loop.

Example cell creation and formatting call:

cell_format = workbook.add_format({'bold': True, 'italic': True})

# inside a loop iterating over your DataFrame
worksheet.write(row, column, value, cell_format)  # Cell is bold and italic.
like image 40
JimmyNJ Avatar answered Oct 11 '22 21:10

JimmyNJ