Is there any way to read out the color-index of cells from excel files with R?
While I can set the cell color with packages like XLConnect
or XLSX
, I have found no way to extract the color-information from existing workbooks.
Importing Excel files into R using readxl packageThe readxl package, developed by Hadley Wickham, can be used to easily import Excel files (xls|xlsx) into R without any external dependencies.
For importing multiple Excel sheets into R, we have to, first install a package in R which is known as readxl. After successfully installing the package, we have to load the package using the library function is R.
Using R to automate Excel is an awesome skill for automating your work (and life). Your company lives off Excel files. Why not automate them & save some time?
R-Bloggers provided a function that will do the job for you. I am including the answer here for future reference.
Read the excel file using xlsx
package:
library(xlsx)
wb <- loadWorkbook("test.xlsx")
sheet1 <- getSheets(wb)[[1]]
# get all rows
rows <- getRows(sheet1)
cells <- getCells(rows)
This part extracts the information that later will be used for getting background color (or other style information) of the cells:
styles <- sapply(cells, getCellStyle) #This will get the styles
This is the function
that identifies/extracts the cell background color:
cellColor <- function(style)
{
fg <- style$getFillForegroundXSSFColor()
rgb <- tryCatch(fg$getRgb(), error = function(e) NULL)
rgb <- paste(rgb, collapse = "")
return(rgb)
}
error
will handle the cells with no background color.
Using sapply
you can get the background color for all of the cells:
sapply(styles, cellColor)
You can also categorize/identify them by knowing the RGb codes:
mycolor <- list(green = "00ff00", red = "ff0000")
m <- match(sapply(styles, cellColor), mycolor)
labs <-names(mycolor)[m]
You can read more and learn how to apply it at R-bloggers
You can get the RGB codes from RapidTables.com
Old question but maybe it can help someone in the future.
There is a strange behavior in the POI (java) library (at least on my computer). It is not getting the colors correctly. The code provided in the @M--'s answer works well when the color is a basic color (indexed color), but does not work when the color is, for example, in grayscale. To get around you can use the following code using the getTint ()
function. Tint is a number between -1 (dark) and 1 (light), and combining it with the RGB (getRgb ()
) function, you can completely recover the color.
cell_color <- function(style){
fg <- style$getFillForegroundXSSFColor()
hex <- tryCatch(fg$getRgb(), error = function(e) NULL)
hex <- paste0("#", paste(hex, collapse = ""))
tint <- tryCatch(fg$getTint(), error = function(e) NULL)
if(!is.null(tint) & !is.null(hex)){ # Tint varies between -1 (dark) and 1 (light)
rgb_col <- col2rgb(col = hex)
if(tint < 0) rgb_col <- (1-abs(tint))*rgb_col
if(tint > 0) rgb_col <- rgb_col + (255-rgb_col)*tint
hex <- rgb(red = rgb_col[1, 1],
green = rgb_col[2, 1],
blue = rgb_col[3, 1],
maxColorValue = 255)
}
return(hex)
}
Some references to help:
https://poi.apache.org/apidocs/dev/org/apache/poi/hssf/usermodel/HSSFExtendedColor.html#getTint--
https://bz.apache.org/bugzilla/show_bug.cgi?id=50787
Getting Excel fill colors using Apache POI
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With