How to loop through a folder of CSV files in R

Question

I have a folder containing a bunch of CSV files that are titled "yob1980", "yob1981", "yob1982" etc.

I have to use a for loop to go through each file and put its contents into a data frame - the columns in the data frame should be "1980", "1981", "1982" etc

Here is what I have:

file_list <- list.files()

temp = list.files(pattern="*.txt")
babynames <- do.call(rbind,lapply(temp,read.csv, FALSE))

names(babynames) <- c("Name", "Gender", "Count")

I feel like I need a for loop, but I'm not sure how to loop through the files. Anyone point me in the right direction?

rosscova · Accepted Answer

My favourite way to do this is using ldply from the plyr package. It has the advantage of returning a dataframe, so you don't need to do the rbind step afterwards:

library( plyr )
babynames <- ldply( .data = list.files(pattern="*.txt"),
                    .fun = read.csv,
                    header = FALSE,
                    col.names=c("Name", "Gender", "Count") )

As an added benefit, you can multi-thread the import very easily, making importing large multi-file datasets quite a bit faster:

library( plyr )
library( doMC )
registerDoMC( cores = 4 )
babynames <- ldply( .data = list.files(pattern="*.txt"),
                    .fun = read.csv,
                    header = FALSE,
                    col.names=c("Name", "Gender", "Count"),
                    .parallel = TRUE )

Changing the above slightly to include a Year column in the resulting data frame, you can create a function first, then execute that function within ldply in the same way you would execute read.csv

readFun <- function( filename ) {

    # read in the data
    data <- read.csv( filename, 
                      header = FALSE, 
                      col.names = c( "Name", "Gender", "Count" ) )

    # add a "Year" column by removing both "yob" and ".txt" from file name
    data$Year <- gsub( "yob|.txt", "", filename )

    return( data )
}

# execute that function across all files, outputting a data frame
doMC::registerDoMC( cores = 4 )
babynames <- plyr::ldply( .data = list.files(pattern="*.txt"),
                          .fun = readFun,
                          .parallel = TRUE )

This will give you your data in a concise and tidy way, which is how I'd recommend moving forward from here. While it is possible to then separate each year's data into it's own column, it's likely not the best way to go.

Note: depending on your preference, it may be a good idea to convert the Year column to say, integer class. But that's up to you.

Icaro Bombonato · Answer

Using purrr

library(tidyverse)

files <- list.files(path = "./data/", pattern = "*.csv")

df <- files %>% 
    map(function(x) {
        read.csv(paste0("./data/", x))
    }) %>%
    reduce(rbind)

Parfait · Answer

Consider an anonymous function within an lapply():

files = list.files(pattern="*.txt")

dfList <- lapply(files, function(i) {
     df <- read.csv(i, header=FALSE, col.names=c("Name", "Gender", "Count"))
     df$Year <- gsub("yob", "", i) 
     return(df)
})

finaldf <- do.call(rbind, dflist)

How to loop through a folder of CSV files in R

Tags:

file

loops

r

csv

krypticlol

3 Answers

rosscova

Icaro Bombonato

Parfait

Recent Activity

Donate For Us

How to loop through a folder of CSV files in R

Tags:

file

loops

r

csv

krypticlol

3 Answers

rosscova

Icaro Bombonato

Parfait

Related questions

Recent Activity

Donate For Us