Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use R and Openxlsx to output a list of dataframes as worksheets in a single Excel file

I have a set of CSV files. I want to package them up and export the data to a single Excel file that contains multiple worksheets. I read in the CSV files as a set of data frames.

My problem is how to construct the command in openxlsx, I can do it manually, but I am having a list construction issue. Specifically how to add a data frame as a subcomponent of a named list and then pass as a parameter to write.xlsx()

Example

Ok, so I first list the CSV files on disk and generate a set of data frames in memory...

# Generate a list of csv files on disk and shorten names... 
filePath <- "../02benchmark/results/results_20170330/"
filePattern <- "*.csv"
fileListwithPath = list.files(path = filePath, pattern = filePattern, full.names = TRUE)
fileList = list.files(path = filePath, pattern = filePattern, full.names = FALSE)

datasets <- gsub("*.csv$", "", fileList)
datasets <- gsub("sample_", "S", datasets)
datasets

# Now generate the dataframes for each csv file...
list2env(
  lapply(setNames(fileListwithPath, make.names(datasets)),
         read.csv), envir = .GlobalEnv)

Example Output:

dput(datasets)
c("S10000_R3.3.2_201703301839", "S10000_T4.3.0_201703301843", 
"S20000_R3.3.2_201703301826", "S20000_T4.3.0_201703301832", "S280000_R3.3.2_201704020847", 
"S280000_T4.3.0_201704021100", "S290000_R3.3.2_201704020447", 
"S290000_T4.3.0_201704020702", "S30000_R3.3.2_201703301803", 
"S30000_T4.3.0_201703301817", "S310000_R3.3.2_201704012331", 
"S310000_T4.3.0_201704020242", "S320000_R3.3.2_201704011827", 
"S320000_T4.3.0_201704012128", "S330000_R3.3.2_201704011304", 
"S330000_T4.3.0_201704011546", "S340000_R3.3.2_201704010652", 
"S340000_T4.3.0_201704011010", "S350000_R3.3.2_201704010020", 
"S350000_T4.3.0_201704010404", "S360000_R3.3.2_201703311819", 
"S360000_T4.3.0_201703312134", "S370000_R3.3.2_201703310914", 
"S370000_T4.3.0_201703311301", "S380000_R3.3.2_201703310134", 
"S380000_T4.3.0_201703310509", "S390000_R3.3.2_201703301846", 
"S390000_T4.3.0_201703302252", "S40000_R3.3.2_201703301738", 
"S40000_T4.3.0_201703301752", "S50000_R3.3.2_201703301707", "S50000_T4.3.0_201703301724", 
"S60000_R3.3.2_201703301624", "S60000_T4.3.0_201703301647", "S70000_R3.3.2_201703301535", 
"S70000_T4.3.0_201703301602", "S80000_R3.3.2_201703301430", "S80000_T4.3.0_201703301508", 
"S90000_R3.3.2_201703301324", "S90000_T4.3.0_201703301400")

Now we have a set of data frames and we wish to create a single excel file with multiple worksheets...

wb <- createWorkbook()
saveWorkbook(wb, 'output.xlsx')

lapply(names(myList), function(x) write.xlsx(myList[[x]], 'output.xlsx', sheetName=x, append=TRUE))

Problem:

The problem is I can create the list structure manually and can confirm it works BUT I cannot seem to construct the list automatically.

myList <- sapply(datasets,function(x) NULL)
names(myList)
str(myList)
myList$S10000_R3.3.2_201703301839 <- eval(S10000_R3.3.2_201703301839)

thus:

> str(myList)
List of 40
 $ S10000_R3.3.2_201703301839 :'data.frame':    43 obs. of  4 variables:
  ..$ function.: Factor w/ 42 levels "DF add random number vector",..: 30 25 38 42 36 39 40 29 26 22 ...
  ..$ user     : num [1:43] 2.144 0.263 0.024 0.068 0.008 ...
  ..$ system   : num [1:43] 0.63 0.065 0.001 0.004 0 ...
  ..$ elapsed  : num [1:43] 12.274 1.104 0.047 0.115 0.009 ...
 $ S10000_T4.3.0_201703301843 : NULL
 $ S20000_R3.3.2_201703301826 : NULL
 ...

Specific Problem: How to append each data frame to the list...

myList <- lapply( myList, function(x) eval(x) )

what am I doing wrong with lapply here? The above lapply() does not iterate through the list and append the data frame to the name list entry.

i.e. myList$S10000_R3.3.2_201703301839 <- eval(S10000_R3.3.2_201703301839)
> str(myList)
    List of 40
     $ S10000_R3.3.2_201703301839 :'data.frame':    43 obs. of  4 variables:
      ..$ function.: Factor w/ 42 levels "DF add random number vector",..: 30 25 38 42 36 39 40 29 26 22 ...
      ..$ user     : num [1:43] 2.144 0.263 0.024 0.068 0.008 ...
      ..$ system   : num [1:43] 0.63 0.065 0.001 0.004 0 ...
      ..$ elapsed  : num [1:43] 12.274 1.104 0.047 0.115 0.009 ...
     $ S10000_T4.3.0_201703301843 : NULL
     $ S20000_R3.3.2_201703301826 : NULL
     ...

What am I missing? All help gratefully appreciated. Yes, I am pretty certain I am missing something obvious... but... I am stumped.

like image 348
Technophobe01 Avatar asked Apr 03 '17 03:04

Technophobe01


People also ask

Can R read Excel file with multiple sheets?

For importing multiple Excel sheets into R, we have to, first install a package in R which is known as readxl. After successfully installing the package, we have to load the package using the library function is R.


2 Answers

I think it may be worth adding solution using imap function from the purrr package as it offers a convenient mechanism to access name and index of the list element in one call:

imap_xxx(x, ...), an indexed map, is short hand for map2(x, names(x), ...) if x has names, or map2(x, seq_along(x), ...) if it does not. This is useful if you need to compute on both the value and the position of an element.

imap solution

On dummy data for reproducibility.

lst_data <- list(cars = mtcars, air = airmiles)
wb <- openxlsx::createWorkbook()
purrr::imap(
    .x = lst_data,
    .f = function(df, object_name) {
        openxlsx::addWorksheet(wb = wb, sheetName = object_name)
        openxlsx::writeData(wb = wb, sheet = object_name, x = df)
    }
)
t_file <- tempfile(pattern = "test_df_export", fileext = ".xlsx")
saveWorkbook(wb = wb, file = t_file)
like image 168
Konrad Avatar answered Oct 01 '22 14:10

Konrad


Here's the solution with openxlsx:

## create data;
dataframes <- split(iris, iris$Species)

# create workbook
wb <- createWorkbook()
 
#Iterate the same way as PavoDive, slightly different (creating an anonymous function inside Map())
Map(function(data, nameofsheet){     

    addWorksheet(wb, nameofsheet)
    writeData(wb, nameofsheet, data)
 
}, dataframes, names(dataframes))
         
## Save workbook to excel file 
saveWorkbook(wb, file = "file.xlsx", overwrite = TRUE)

.. however, openxlsx is also able to use it's function openxlsx::write.xlsx for this, so you can just give the object with your list of dataframes and the filepath, and openxlsx is smart enough to create the list as sheets within the xlsx-file. The code I post here with Map() is if you want to format the sheets in a specific way.

like image 23
emilBeBri Avatar answered Oct 01 '22 13:10

emilBeBri