I have a series of csv files (one per anum) with the same column headers and different number of rows. Originally I was reading them in and merging them like so;
setwd <- ("N:/Ring data by cruise/Shetland")
LengthHeight2013 <- read.csv("N:/Ring data by cruise/Shetland/R_0113A_S2013_WD.csv",sep=",",header=TRUE)
LengthHeight2012 <- read.csv("N:/Ring data by cruise/Shetland/R_0212A_S2012_WD.csv",sep=",",header=TRUE)
LengthHeight2011 <- read.csv("N:/Ring data by cruise/Shetland/R_0211A_S2011_WOD.csv",sep=",",header=TRUE)
LengthHeight2010 <- read.csv("N:/Ring data by cruise/Shetland/R_0310A_S2010_WOD.csv",sep=",",header=TRUE)
LengthHeight2009 <- read.csv("N:/Ring data by cruise/Shetland/R_0309A_S2009_WOD.csv",sep=",",header=TRUE)
LengthHeight <- merge(LengthHeight2013,LengthHeight2012,all=TRUE)
LengthHeight <- merge(LengthHeight,LengthHeight2011,all=TRUE)
LengthHeight <- merge(LengthHeight,LengthHeight2010,all=TRUE)
LengthHeight <- merge(LengthHeight,LengthHeight2009,all=TRUE)
I would like to know if there is a shorter/tidier way to do this, also considering that each time I run the script I might want to look at a different range of years.
I also found this bit of code by Tony Cookson which looks like it would do what I want, however the data frame it produces for me has only the correct headers but no data rows.
multmerge = function(mypath){
filenames=list.files(path=mypath, full.names=TRUE)
datalist = lapply(filenames, function(x){read.csv(file=x,header=T)})
Reduce(function(x,y) {merge(x,y)}, datalist)
mymergeddata = multmerge("C://R//mergeme")
Code explanation Here, the glob module helps extract file directory (path + file name with extension), Lines 10–13: We create a list type object dataFrames to keep every csv as a DataFrame at each index of that list. Line 15: We call pd. concat() method to merge each DataFrame in the list by columns, that is, axis=1 .
Find files (list.files
) and read the files in a loop (lapply
), then call (do.call
) row bind (rbind
) to put all files together by rows.
myMergedData <-
do.call(rbind,
lapply(list.files(path = "N:/Ring data by cruise"), read.csv))
Update: There is a vroom package, according to the manuals it is much faster than data.table::fread and base read.csv. The syntax looks neat, too:
library(vroom)
myMergedData <- vroom(files)
If you're looking for speed, then try this:
require(data.table) ## 1.9.2 or 1.9.3
ans = rbindlist(lapply(filenames, fread))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With