Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I turn the filename into a variable when reading multiple csvs into R

Tags:

import

r

csv

I have a bunch of csv files that follow the naming scheme: est2009US.csv.

I am reading them into R as follows:

myFiles <- list.files(path="~/Downloads/gtrends/", pattern = "^est[[:digit:]][[:digit:]][[:digit:]][[:digit:]]US*\\.csv$")

myDB <- do.call("rbind", lapply(myFiles, read.csv, header = TRUE))

I would like to find a way to create a new variable that, for each record, is populated with the name of the file the record came from.

like image 203
user2658742 Avatar asked Dec 11 '22 12:12

user2658742


2 Answers

You can avoid looping twice by using an anonymous function that assigns the file name as a column to each data.frame in the same lapply that you use to read the csvs.

myDB <- do.call("rbind", lapply(myFiles, function(x) {
  dat <- read.csv(x, header=TRUE)
  dat$fileName <- tools::file_path_sans_ext(basename(x))
  dat
}))

I stripped out the directory and file extension. basename() returns the file name, not including the directory, and tools::file_path_sans_ext() removes the file extension.

like image 126
GSee Avatar answered Jan 12 '23 16:01

GSee


plyr makes this very easy:

library(plyr)
paths <- dir(pattern = "\\.csv$")
names(paths) <- basename(paths)

all <- ldply(paths, read.csv)

Because paths is named, all will automatically get a column containing those names.

like image 40
hadley Avatar answered Jan 12 '23 18:01

hadley