I have a set of CSV files. I want to package them up and export the data to a single Excel file that contains multiple worksheets. I read in the CSV files as a set of data frames.
My problem is how to construct the command in openxlsx
, I can do it manually, but I am having a list construction issue. Specifically how to add a data frame as a subcomponent of a named list and then pass as a parameter to write.xlsx()
Ok, so I first list the CSV files on disk and generate a set of data frames in memory...
# Generate a list of csv files on disk and shorten names...
filePath <- "../02benchmark/results/results_20170330/"
filePattern <- "*.csv"
fileListwithPath = list.files(path = filePath, pattern = filePattern, full.names = TRUE)
fileList = list.files(path = filePath, pattern = filePattern, full.names = FALSE)
datasets <- gsub("*.csv$", "", fileList)
datasets <- gsub("sample_", "S", datasets)
datasets
# Now generate the dataframes for each csv file...
list2env(
lapply(setNames(fileListwithPath, make.names(datasets)),
read.csv), envir = .GlobalEnv)
Example Output:
dput(datasets)
c("S10000_R3.3.2_201703301839", "S10000_T4.3.0_201703301843",
"S20000_R3.3.2_201703301826", "S20000_T4.3.0_201703301832", "S280000_R3.3.2_201704020847",
"S280000_T4.3.0_201704021100", "S290000_R3.3.2_201704020447",
"S290000_T4.3.0_201704020702", "S30000_R3.3.2_201703301803",
"S30000_T4.3.0_201703301817", "S310000_R3.3.2_201704012331",
"S310000_T4.3.0_201704020242", "S320000_R3.3.2_201704011827",
"S320000_T4.3.0_201704012128", "S330000_R3.3.2_201704011304",
"S330000_T4.3.0_201704011546", "S340000_R3.3.2_201704010652",
"S340000_T4.3.0_201704011010", "S350000_R3.3.2_201704010020",
"S350000_T4.3.0_201704010404", "S360000_R3.3.2_201703311819",
"S360000_T4.3.0_201703312134", "S370000_R3.3.2_201703310914",
"S370000_T4.3.0_201703311301", "S380000_R3.3.2_201703310134",
"S380000_T4.3.0_201703310509", "S390000_R3.3.2_201703301846",
"S390000_T4.3.0_201703302252", "S40000_R3.3.2_201703301738",
"S40000_T4.3.0_201703301752", "S50000_R3.3.2_201703301707", "S50000_T4.3.0_201703301724",
"S60000_R3.3.2_201703301624", "S60000_T4.3.0_201703301647", "S70000_R3.3.2_201703301535",
"S70000_T4.3.0_201703301602", "S80000_R3.3.2_201703301430", "S80000_T4.3.0_201703301508",
"S90000_R3.3.2_201703301324", "S90000_T4.3.0_201703301400")
wb <- createWorkbook()
saveWorkbook(wb, 'output.xlsx')
lapply(names(myList), function(x) write.xlsx(myList[[x]], 'output.xlsx', sheetName=x, append=TRUE))
The problem is I can create the list structure manually and can confirm it works BUT I cannot seem to construct the list automatically.
myList <- sapply(datasets,function(x) NULL)
names(myList)
str(myList)
myList$S10000_R3.3.2_201703301839 <- eval(S10000_R3.3.2_201703301839)
thus:
> str(myList)
List of 40
$ S10000_R3.3.2_201703301839 :'data.frame': 43 obs. of 4 variables:
..$ function.: Factor w/ 42 levels "DF add random number vector",..: 30 25 38 42 36 39 40 29 26 22 ...
..$ user : num [1:43] 2.144 0.263 0.024 0.068 0.008 ...
..$ system : num [1:43] 0.63 0.065 0.001 0.004 0 ...
..$ elapsed : num [1:43] 12.274 1.104 0.047 0.115 0.009 ...
$ S10000_T4.3.0_201703301843 : NULL
$ S20000_R3.3.2_201703301826 : NULL
...
myList <- lapply( myList, function(x) eval(x) )
what am I doing wrong with lapply here? The above lapply() does not iterate through the list and append the data frame to the name list entry.
i.e. myList$S10000_R3.3.2_201703301839 <- eval(S10000_R3.3.2_201703301839)
> str(myList)
List of 40
$ S10000_R3.3.2_201703301839 :'data.frame': 43 obs. of 4 variables:
..$ function.: Factor w/ 42 levels "DF add random number vector",..: 30 25 38 42 36 39 40 29 26 22 ...
..$ user : num [1:43] 2.144 0.263 0.024 0.068 0.008 ...
..$ system : num [1:43] 0.63 0.065 0.001 0.004 0 ...
..$ elapsed : num [1:43] 12.274 1.104 0.047 0.115 0.009 ...
$ S10000_T4.3.0_201703301843 : NULL
$ S20000_R3.3.2_201703301826 : NULL
...
What am I missing? All help gratefully appreciated. Yes, I am pretty certain I am missing something obvious... but... I am stumped.
For importing multiple Excel sheets into R, we have to, first install a package in R which is known as readxl. After successfully installing the package, we have to load the package using the library function is R.
I think it may be worth adding solution using imap
function from the purrr
package as it offers a convenient mechanism to access name and index of the list element in one call:
imap_xxx(x, ...)
, an indexed map, is short hand formap2(x, names(x), ...)
ifx
has names, ormap2(x, seq_along(x), ...)
if it does not. This is useful if you need to compute on both the value and the position of an element.
imap
solutionOn dummy data for reproducibility.
lst_data <- list(cars = mtcars, air = airmiles)
wb <- openxlsx::createWorkbook()
purrr::imap(
.x = lst_data,
.f = function(df, object_name) {
openxlsx::addWorksheet(wb = wb, sheetName = object_name)
openxlsx::writeData(wb = wb, sheet = object_name, x = df)
}
)
t_file <- tempfile(pattern = "test_df_export", fileext = ".xlsx")
saveWorkbook(wb = wb, file = t_file)
Here's the solution with openxlsx
:
## create data;
dataframes <- split(iris, iris$Species)
# create workbook
wb <- createWorkbook()
#Iterate the same way as PavoDive, slightly different (creating an anonymous function inside Map())
Map(function(data, nameofsheet){
addWorksheet(wb, nameofsheet)
writeData(wb, nameofsheet, data)
}, dataframes, names(dataframes))
## Save workbook to excel file
saveWorkbook(wb, file = "file.xlsx", overwrite = TRUE)
.. however, openxlsx
is also able to use it's function openxlsx::write.xlsx
for this, so you can just give the object with your list of dataframes and the filepath, and openxlsx
is smart enough to create the list as sheets within the xlsx-file. The code I post here with Map()
is if you want to format the sheets in a specific way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With