Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to unread hide excel sheet in R(read_excel)?

Tags:

r

xlsx

Would like to know is there any method or techniques which can ignore the hidden excel_sheet in an excel workbook (read_excel).

Why i am asking is ,Have many excel files in that many sheets are hidden.

what i have tried so far is referred below link.

How to ignore hidden data when importing from Excel

Appreciate any help.

like image 989
Tushar Lad Avatar asked Aug 19 '20 05:08

Tushar Lad


1 Answers

Here's a function that will inform whatever function you want to use, whether it's readxl::read_excel or otherwise. (I've tested this with a couple of .xlsx files, but nothing extensive.)

This requires the xml2 package.

xlsx_sheet_info <- function(filename) {
  stopifnot(file.exists(filename))
  tmpdir <- tempfile(pattern = "xldir")
  fn <- try(
    utils::unzip(filename, files = "xl/workbook.xml", exdir = tmpdir, junkpaths = TRUE),
    silent = TRUE)
  if (inherits(fn, "try-error")) stop("unable to find 'xl/workbook.xml'")
  on.exit({
    # clean up our temporary directory used to extract the file
    if (dir.exists(tmpdir)) {
      suppressWarnings(unlink(tmpdir, recursive = TRUE, force = TRUE))
    }
  })
  xml <- try(xml2::read_xml(fn), silent = TRUE)
  if (inherits(xml, "try-error")) stop("unable to parse xml")
  # everything we want in sheets is stored as element attributes
  sheets <- lapply(xml2::as_list(xml)$workbook$sheets, attributes)
  nms <- unique(unlist(lapply(sheets, names)))
  out <- do.call(rbind.data.frame, lapply(sheets, function(sh) {
    # I believe attributes will always be 'character'
    sh[setdiff(nms, names(sh))] <- NA_character_
    # return them in-order, safe for 'rbind.data.frame'
    sh[nms]
  }))
  out[] <- lapply(out, type.convert, as.is = TRUE)
  out
}

I created a simple workbook with three worksheets, with various levels of "hidden"-ness: xlSheetVisible, xlSheetHidden, and xlSheetVeryHidden (per Excel VBA enums).

info <- xlsx_sheet_info("Book1.xlsx")
info
#          name sheetId   id      state
# sheet  Sheet1       1 rId1       <NA>
# sheet1 Sheet3       3 rId2 veryHidden
# sheet2 Sheet2       2 rId3     hidden

This is a data.frame, so you can use is.na(info$state) (or grep for "hidden") to choose specific rows, either by name or sheetId (I assume these are monotonic integers, compatible with readxl::read_xlsx and friends).

I would not assume that the order of the sheets is preserved in the rows, as is suggested here. In fact, I created the sheets in order: visible, hidden, very-hidden.

like image 175
r2evans Avatar answered Sep 28 '22 18:09

r2evans