I have hundreds of medium sized Excel files (between 5000 and 50.0000 rows with about 100 columns) to load into R. They have a well-defined naming pattern, like <code>x_1.xlsx</code>, <code>x_2.xlsx</code>, etc. How can I load these files into R in the fastest, most straightforward way?

With <code>list.files</code> you can create a list of all the filenames in your workingdirectory. Next you can use <code>lapply</code> to loop over that list and read each file with the <code>read_excel</code> function from the <code>readxl</code> package: <pre class="prettyprint"><code>library(readxl) file.list <- list.files(pattern='*.xlsx') df.list <- lapply(file.list, read_excel) </code></pre> This method can off course also be used with other file reading functions like <code>read.csv</code> or <code>read.table</code>. Just replace <code>read_excel</code> with the appropriate file reading function and make sure you use the correct pattern in <code>list.files</code>. If you also want to include the files in subdirectories, use: <pre class="prettyprint"><code>file.list <- list.files(pattern='*.xlsx', recursive = TRUE) </code></pre> Other possible packages for reading Excel-files: openxlsx & xlsx <hr> Supposing the columns are the same for each file, you can bind them together in one dataframe with <code>bind_rows</code> from dplyr: <pre class="prettyprint"><code>library(dplyr) df <- bind_rows(df.list, .id = "id") </code></pre> or with <code>rbindlist</code> from data.table: <pre class="prettyprint"><code>library(data.table) df <- rbindlist(df.list, idcol = "id") </code></pre> Both have the option to add a <code>id</code> column for identifying the separate datasets. <hr> Update: If you don't want a numeric identifier, just use <code>sapply</code> with <code>simplify = FALSE</code> to read the files in <code>file.list</code>: <pre class="prettyprint"><code>df.list <- sapply(file.list, read.csv, simplify=FALSE) </code></pre> When using <code>bind_rows</code> from dplyr or <code>rbindlist</code> from data.table, the <code>id</code> column now contains the filenames. Even another approach is using the <code>purrr</code>-package: <pre class="prettyprint"><code>library(purrr) file.list <- list.files(pattern='*.csv') file.list <- setNames(file.list, file.list) # only needed when you need an id-column with the file-names df <- map_df(file.list, read.csv, .id = "id") </code></pre> <hr> Other approaches to getting a named list: If you don't want just a numeric identifier, than you can assign the filenames to the dataframes in the list before you bind them together. There are several ways to do this: <pre class="prettyprint"><code># with the 'attr' function from base R attr(df.list, "names") <- file.list # with the 'names' function from base R names(df.list) <- file.list # with the 'setattr' function from the 'data.table' package setattr(df.list, "names", file.list) </code></pre> Now you can bind the list of dataframes together in one dataframe with <code>rbindlist</code> from data.table or <code>bind_rows</code> from dplyr. The <code>id</code> column will now contain the filenames instead of a numeric indentifier.

How can I read multiple (excel) files into R? [duplicate]

1 Answers

With list.files you can create a list of all the filenames in your workingdirectory. Next you can use lapply to loop over that list and read each file with the read_excel function from the readxl package:

Click to copy

library(readxl) file.list <- list.files(pattern='*.xlsx') df.list <- lapply(file.list, read_excel)

This method can off course also be used with other file reading functions like read.csv or read.table. Just replace read_excel with the appropriate file reading function and make sure you use the correct pattern in list.files.

If you also want to include the files in subdirectories, use:

Click to copy

file.list <- list.files(pattern='*.xlsx', recursive = TRUE)

Other possible packages for reading Excel-files: openxlsx & xlsx

Supposing the columns are the same for each file, you can bind them together in one dataframe with bind_rows from dplyr:

Click to copy

library(dplyr) df <- bind_rows(df.list, .id = "id")

or with rbindlist from data.table:

Click to copy

library(data.table) df <- rbindlist(df.list, idcol = "id")

Both have the option to add a id column for identifying the separate datasets.

Update: If you don't want a numeric identifier, just use sapply with simplify = FALSE to read the files in file.list:

Click to copy

df.list <- sapply(file.list, read.csv, simplify=FALSE)

When using bind_rows from dplyr or rbindlist from data.table, the id column now contains the filenames.

Even another approach is using the purrr-package:

Click to copy

library(purrr) file.list <- list.files(pattern='*.csv') file.list <- setNames(file.list, file.list) # only needed when you need an id-column with the file-names  df <- map_df(file.list, read.csv, .id = "id")

Other approaches to getting a named list: If you don't want just a numeric identifier, than you can assign the filenames to the dataframes in the list before you bind them together. There are several ways to do this:

Click to copy

# with the 'attr' function from base R attr(df.list, "names") <- file.list # with the 'names' function from base R names(df.list) <- file.list # with the 'setattr' function from the 'data.table' package setattr(df.list, "names", file.list)

Now you can bind the list of dataframes together in one dataframe with rbindlist from data.table or bind_rows from dplyr. The id column will now contain the filenames instead of a numeric indentifier.

133

answered Oct 11 '22 13:10

Jaap

Related questions
                            
                                How to rename a variable in R without copying the object?
                            
                                How to display all x labels in R barplot?
                            
                                Adding simple legend to plot in R
                            
                                Importing csv file into R - numeric values read as characters
                            
                                Does an R compiler to C/C++ exist?
                            
                                How do I select columns that may or may not exist?
                            
                                R glmnet : "(list) object cannot be coerced to type 'double' "
                            
                                How to indent multiple lines of code in Rstudio?
                            
                                Using regex in R to find strings as whole words (but not strings as part of words)
                            
                                Using parLapply and clusterExport inside a function
                            
                                Increasing area around plot area in ggplot2 [duplicate]
                            
                                Figures captions and labels in knitr
                            
                                Read SAS sas7bdat data into R
                            
                                How to round a number and make it show zeros?
                            
                                Space after every five rows in kable output (with booktabs option) in R Markdown document
                            
                                Computing cross-correlation function?
                            
                                What is the difference between Multiple R-squared and Adjusted R-squared in a single-variate least squares regression?
                            
                                Splitting a file name into name,extension
                            
                                More than six shapes in ggplot
                            
                                Repeat list object n times

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I read multiple (excel) files into R? [duplicate]

Tags:

import

r

excel

Manuel R

People also ask

1 Answers

Jaap

Recent Activity

Donate For Us