I have 3 files with 3 variables: date, ID, and price. I would like to merge them by date, so if one my current files is:
date ID Price
01/01/10 A 1
01/02/10 A 1.02
01/02/10 A 0.99
...
...
I would like to get a merged file that looks like the one below for IDs A,B and C (Pr for Price):
date Pr.A Pr.B Pr.C
01/01/10 1 NA NA
01/02/10 1.02 1.2 NA
01/03/10 0.99 1.3 1
01/04/10 NA 1.23 2
01/05/10 NA NA 3
Notice that for some dates there are not prices so in that case is an NA.
My current approach works but I feel is a bit clumsy.
setwd('~where you put the files')
library(plyr)
listnames = list.files(pattern='.csv')
pp1 = ldply(listnames,read.csv,header=T) #put all the files in a data.frame
names(pp1)=c('date','ID','price')
pp1$date = as.Date(pp1$date,format='%m/%d/%Y')
# Reshape data frame so it gets organized by date
pp1=reshape(pp1,timevar='ID',idvar='date',direction='wide')
Is there any better approach you could think of?
Looks like a job for Reduce():
# Read the files in to a single list, removing unwanted second column from each.
dataDir <- "example"
fNames <- dir(dataDir)
dataList <- lapply(file.path(dataDir, fNames),
function(X) {read.csv(X, header=TRUE)[-2]})
# Merge them
out <- Reduce(function(x,y) merge(x,y, by=1, all=TRUE), dataList)
# Construct column names
names(out)[-1] <- paste("Pr.", toupper(sub("1.csv", "", fNames)), sep="")
out
# date Pr.A Pr.B Pr.C
# 1 1/1/2010 1.00 NA NA
# 2 1/2/2010 1.02 1.20 NA
# 3 1/3/2010 0.99 1.30 1
# 4 1/4/2010 NA 1.23 2
# 5 1/5/2010 NA NA 3
Actually, your approach looks just fine to me, but I can see preferring the simplicity and transparency of syntax in a call to Reduce.
I dont have access to the files, I am behind a corporate firewall. I would use the cast method once you have built the data.frame.
res = cast(pp1,date~ID,value="Price",mean)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With