Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using R to read out excel-colorinfo

Is there any way to read out the color-index of cells from excel files with R?

While I can set the cell color with packages like XLConnect or XLSX, I have found no way to extract the color-information from existing workbooks.

like image 936
Bens Avatar asked Mar 23 '17 16:03

Bens


People also ask

Can we read data in Excel sheets using R?

Importing Excel files into R using readxl packageThe readxl package, developed by Hadley Wickham, can be used to easily import Excel files (xls|xlsx) into R without any external dependencies.

Can R read multiple Excel sheets?

For importing multiple Excel sheets into R, we have to, first install a package in R which is known as readxl. After successfully installing the package, we have to load the package using the library function is R.

Can we automate Excel using R?

Using R to automate Excel is an awesome skill for automating your work (and life). Your company lives off Excel files. Why not automate them & save some time?


2 Answers

R-Bloggers provided a function that will do the job for you. I am including the answer here for future reference.

Read the excel file using xlsx package:

library(xlsx)
wb     <- loadWorkbook("test.xlsx")
sheet1 <- getSheets(wb)[[1]]

# get all rows
rows  <- getRows(sheet1)
cells <- getCells(rows)

This part extracts the information that later will be used for getting background color (or other style information) of the cells:

styles <- sapply(cells, getCellStyle) #This will get the styles

This is the function that identifies/extracts the cell background color:

cellColor <- function(style) 
   {
    fg  <- style$getFillForegroundXSSFColor()
    rgb <- tryCatch(fg$getRgb(), error = function(e) NULL)
    rgb <- paste(rgb, collapse = "")
    return(rgb)
   }

error will handle the cells with no background color.

Using sapply you can get the background color for all of the cells:

sapply(styles, cellColor)

You can also categorize/identify them by knowing the RGb codes:

mycolor <- list(green = "00ff00", red = "ff0000")
m     <- match(sapply(styles, cellColor), mycolor)
labs  <-names(mycolor)[m]

You can read more and learn how to apply it at R-bloggers

You can get the RGB codes from RapidTables.com

like image 95
M-- Avatar answered Oct 04 '22 20:10

M--


Old question but maybe it can help someone in the future.

There is a strange behavior in the POI (java) library (at least on my computer). It is not getting the colors correctly. The code provided in the @M--'s answer works well when the color is a basic color (indexed color), but does not work when the color is, for example, in grayscale. To get around you can use the following code using the getTint () function. Tint is a number between -1 (dark) and 1 (light), and combining it with the RGB (getRgb ()) function, you can completely recover the color.

cell_color <- function(style){
  fg  <- style$getFillForegroundXSSFColor()

  hex <- tryCatch(fg$getRgb(), error = function(e) NULL)
  hex <- paste0("#", paste(hex, collapse = ""))
  tint <- tryCatch(fg$getTint(), error = function(e) NULL)

  if(!is.null(tint) & !is.null(hex)){   # Tint varies between -1 (dark) and 1 (light)
    rgb_col <- col2rgb(col = hex)

    if(tint < 0) rgb_col <- (1-abs(tint))*rgb_col
    if(tint > 0) rgb_col <- rgb_col + (255-rgb_col)*tint

    hex <- rgb(red = rgb_col[1, 1], 
               green = rgb_col[2, 1], 
               blue = rgb_col[3, 1], 
               maxColorValue = 255)
  }

  return(hex)
}

Some references to help:

https://poi.apache.org/apidocs/dev/org/apache/poi/hssf/usermodel/HSSFExtendedColor.html#getTint--

https://bz.apache.org/bugzilla/show_bug.cgi?id=50787

Getting Excel fill colors using Apache POI

like image 24
Douglas Mesquita Avatar answered Oct 04 '22 22:10

Douglas Mesquita