I have a bunch of csv files that follow the naming scheme: est2009US.csv.
I am reading them into R as follows:
myFiles <- list.files(path="~/Downloads/gtrends/", pattern = "^est[[:digit:]][[:digit:]][[:digit:]][[:digit:]]US*\\.csv$")
myDB <- do.call("rbind", lapply(myFiles, read.csv, header = TRUE))
I would like to find a way to create a new variable that, for each record, is populated with the name of the file the record came from.
You can avoid looping twice by using an anonymous function that assigns the file name as a column to each data.frame
in the same lapply
that you use to read the csvs.
myDB <- do.call("rbind", lapply(myFiles, function(x) {
dat <- read.csv(x, header=TRUE)
dat$fileName <- tools::file_path_sans_ext(basename(x))
dat
}))
I stripped out the directory and file extension. basename()
returns the file name, not including the directory, and tools::file_path_sans_ext()
removes the file extension.
plyr
makes this very easy:
library(plyr)
paths <- dir(pattern = "\\.csv$")
names(paths) <- basename(paths)
all <- ldply(paths, read.csv)
Because paths
is named, all
will automatically get a column containing those names.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With