Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get Excel cell background color in pandas read_excel?

I have an Excel file with cells having background colors. I am reading that file into pandas with read_excel. Is there any way to get the background colors of cells?

like image 977
csaladenes Avatar asked Dec 17 '17 16:12

csaladenes


People also ask

How do I color the background of a cell in Excel?

Select the cell or range of cells you want to format. Click Home > Format Cells dialog launcher, or press Ctrl+Shift+F. On the Fill tab, under Background Color, pick the color you want.

How do I read XLXS in pandas?

xls) with Python Pandas. To read an excel file as a DataFrame, use the pandas read_excel() method. You can read the first sheet, specific sheets, multiple sheets or all sheets. Pandas converts this to the DataFrame structure, which is a tabular like structure.


2 Answers

Brute-forced it through xlrd, as per Mark's suggestion:

from xlrd import open_workbook
wb = open_workbook('wb.xls', formatting_info=True)
sheet = wb.sheet_by_name("mysheet")
#create empy colormask matrix
bgcol=np.zeros([sheet.nrows,sheet.ncols])
#cycle through all cells to get colors
for row in range(sheet.nrows):
  for column in range(sheet.ncols):
    cell = sheet.cell(row, column)  
    fmt = wb.xf_list[cell.xf_index]
    bgcol[row,column]=fmt.background.background_colour_index
#return pandas mask of colors
colormask=pd.DataFrame(bgcol) 

Yet, there must be a better way thorugh pandas directly...

like image 116
csaladenes Avatar answered Sep 16 '22 20:09

csaladenes


The Solution suggested above works only for xls file, not for xlsx file. This raises a NotImplementedError: formatting_info=True not yet implemented. Xlrd library is still not updated to work for xlsx files. So you have to Save As and change the format every time which may not work for you.
Here is a solution for xlsx files using openpyxl library. A2 is the cell whose color code we need to find out.

import openpyxl
from openpyxl import load_workbook
excel_file = 'color_codes.xlsx' 
wb = load_workbook(excel_file, data_only = True)
sh = wb['Sheet1']
color_in_hex = sh['A2'].fill.start_color.index # this gives you Hexadecimal value of the color
print ('HEX =',color_in_hex) 
print('RGB =', tuple(int(color_in_hex[i:i+2], 16) for i in (0, 2, 4))) # Color in RGB
like image 26
Sumit Pokhrel Avatar answered Sep 16 '22 20:09

Sumit Pokhrel