I have written the following function to combine 300 .csv files. My directory name is "specdata". I have done the following steps for execution, <pre class="prettyprint"><code>x <- function(directory) { dir <- directory data_dir <- paste(getwd(),dir,sep = "/") files <- list.files(data_dir,pattern = '\\.csv') tables <- lapply(paste(data_dir,files,sep = "/"), read.csv, header = TRUE) pollutantmean <- do.call(rbind , tables) } # Step 2: call the function x("specdata") # Step 3: inspect results head(pollutantmean) Error in head(pollutantmean) : object 'pollutantmean' not found </code></pre> What is my mistake? Can anyone please explain?

There's a lot of unnecessary code in your function. You can simplify it to: <pre class="prettyprint"><code>load_data <- function(path) { files <- dir(path, pattern = '\\.csv', full.names = TRUE) tables <- lapply(files, read.csv) do.call(rbind, tables) } pollutantmean <- load_data("specdata") </code></pre> Be aware that <code>do.call</code> + <code>rbind</code> is relatively slow. You might find <code>dplyr::bind_rows</code> or <code>data.table::rbindlist</code> to be substantially faster.

This can be done very succinctly with dplyr and purrr from the tidyverse. Where x is a list of the names of your csv files you can simply use: <pre class="prettyprint"><code>bind_rows(map(x, read.csv)) </code></pre> Mapping read.csv to x produces a list of dfs that bind_rows then neatly combines!

What's wrong with my function to load multiple .csv files into single dataframe in R using rbind?

Tags:

r

csv

rbind

I have written the following function to combine 300 .csv files. My directory name is "specdata". I have done the following steps for execution,

x <- function(directory) {     
    dir <- directory    
    data_dir <- paste(getwd(),dir,sep = "/")    
    files  <- list.files(data_dir,pattern = '\\.csv')    
    tables <- lapply(paste(data_dir,files,sep = "/"), read.csv, header = TRUE)    
    pollutantmean <- do.call(rbind , tables)         
}

# Step 2: call the function
x("specdata")

# Step 3: inspect results
head(pollutantmean)

Error in head(pollutantmean) : object 'pollutantmean' not found

What is my mistake? Can anyone please explain?

707

asked Apr 21 '14 03:04

Sivanantham C

4 Answers

There's a lot of unnecessary code in your function. You can simplify it to:

load_data <- function(path) { 
  files <- dir(path, pattern = '\\.csv', full.names = TRUE)
  tables <- lapply(files, read.csv)
  do.call(rbind, tables)
}

pollutantmean <- load_data("specdata")

Be aware that do.call + rbind is relatively slow. You might find dplyr::bind_rows or data.table::rbindlist to be substantially faster.

115

answered Oct 13 '22 22:10

hadley

To update Prof. Wickham's answer above with code from the more recent purrr library which he coauthored with Lionel Henry:

Tbl <-
    list.files(pattern="*.csv") %>% 
    map_df(~read_csv(.))

If the typecasting is being cheeky, you can force all the columns to be as characters with this.

Tbl <-
    list.files(pattern="*.csv") %>% 
    map_df(~read_csv(., col_types = cols(.default = "c")))

If you are wanting to dip into subdirectories to construct your list of files to eventually bind, then be sure to include the path name, as well as register the files with their full names in your list. This will allow the binding work to go on outside of the current directory. (Thinking of the full pathnames as operating like passports to allow movement back across directory 'borders'.)

Tbl <-
    list.files(path = "./subdirectory/",
               pattern="*.csv", 
               full.names = T) %>% 
    map_df(~read_csv(., col_types = cols(.default = "c")))

As Prof. Wickham describes here (about halfway down):

map_df(x, f) is effectively the same as do.call("rbind", lapply(x, f)) but under the hood is much more efficient.

and a thank you to Jake Kaupp for introducing me to map_df() here.

answered Oct 13 '22 23:10

leerssej

This can be done very succinctly with dplyr and purrr from the tidyverse. Where x is a list of the names of your csv files you can simply use:

bind_rows(map(x, read.csv))

Mapping read.csv to x produces a list of dfs that bind_rows then neatly combines!

answered Oct 14 '22 00:10

CClarke

```{r echo = FALSE, warning = FALSE, message = FALSE}

setwd("~/Data/R/BacklogReporting/data/PastDue/global/") ## where file are located

path = "~/Data/R/BacklogReporting/data/PastDue/global/"
out.file <- ""
file.names <- dir(path, pattern = ".csv")
for(i in 1:length(file.names)){
  file <- read.csv(file.names[i], header = TRUE, stringsAsFactors = FALSE)
  out.file <- rbind(out.file, file)
}

write.csv(out.file, file = "~/Data/R/BacklogReporting/data/PastDue/global/global_stacked/past_due_global_stacked.csv", row.names = FALSE) ## directory to write stacked file to

past_due_global_stacked <- read.csv("C:/Users/E550143/Documents/Data/R/BacklogReporting/data/PastDue/global/global_stacked/past_due_global_stacked.csv", stringsAsFactors = FALSE)

files <- list.files(pattern = "\\.csv$") %>%  t() %>% paste(collapse = ", ")
```

answered Oct 14 '22 00:10

Dave Headrick

Related questions
                            
                                How can I prevent a library from masking functions
                            
                                How to replace empty string with NA in R dataframe?
                            
                                Sort data frame column by factor
                            
                                Three dimensional array to list
                            
                                How do I combine aes() and aes_string() options
                            
                                rmarkdown error "attempt to use zero-length variable name"
                            
                                More efficient R / Sweave / TeXShop work-flow?
                            
                                How do I add the mean value to a histogram in R?
                            
                                Read csv from specific row
                            
                                How do I generate a histogram for each column of my table?
                            
                                Add missing value in column with value from row above
                            
                                Joining aggregated values back to the original data frame [duplicate]
                            
                                How to fill NAs with LOCF by factors in data frame, split by country
                            
                                Difference between the == and %in% operators in R [duplicate]
                            
                                How to find the difference in value in every two consecutive rows in R?
                            
                                Fill in data frame with values from rows above
                            
                                dplyr if_else() vs base R ifelse()
                            
                                Filter values from list in R
                            
                                How do I use the lubridate package to calculate the number of months between two date vectors where one of the vectors has NA values?
                            
                                Deleting every n-th row in a dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With