Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging .csv files with R [duplicate]

Tags:

r

plyr

I have 3 files with 3 variables: date, ID, and price. I would like to merge them by date, so if one my current files is:

date      ID  Price
01/01/10   A   1
01/02/10   A   1.02
01/02/10   A   0.99
...
...

I would like to get a merged file that looks like the one below for IDs A,B and C (Pr for Price):

date       Pr.A   Pr.B  Pr.C     
01/01/10   1      NA    NA
01/02/10   1.02   1.2   NA
01/03/10   0.99   1.3   1
01/04/10   NA     1.23  2
01/05/10   NA     NA    3

Notice that for some dates there are not prices so in that case is an NA.

My current approach works but I feel is a bit clumsy.

setwd('~where you put the files')
library(plyr)
listnames = list.files(pattern='.csv')
pp1 = ldply(listnames,read.csv,header=T) #put all the files in a data.frame

names(pp1)=c('date','ID','price')
pp1$date = as.Date(pp1$date,format='%m/%d/%Y')

# Reshape data frame so it gets organized by date
pp1=reshape(pp1,timevar='ID',idvar='date',direction='wide')

Is there any better approach you could think of?

like image 729
aatrujillob Avatar asked May 01 '26 16:05

aatrujillob


2 Answers

Looks like a job for Reduce():

# Read the files in to a single list, removing unwanted second column from each.
dataDir <- "example"
fNames <- dir(dataDir)
dataList <- lapply(file.path(dataDir, fNames),
                   function(X) {read.csv(X, header=TRUE)[-2]})

# Merge them                   
out <- Reduce(function(x,y) merge(x,y, by=1, all=TRUE), dataList)

# Construct column names
names(out)[-1] <- paste("Pr.", toupper(sub("1.csv", "", fNames)), sep="")
out
#       date Pr.A Pr.B Pr.C
# 1 1/1/2010 1.00   NA   NA
# 2 1/2/2010 1.02 1.20   NA
# 3 1/3/2010 0.99 1.30    1
# 4 1/4/2010   NA 1.23    2
# 5 1/5/2010   NA   NA    3

Actually, your approach looks just fine to me, but I can see preferring the simplicity and transparency of syntax in a call to Reduce.

like image 200
Josh O'Brien Avatar answered May 03 '26 05:05

Josh O'Brien


I dont have access to the files, I am behind a corporate firewall. I would use the cast method once you have built the data.frame.

    res = cast(pp1,date~ID,value="Price",mean)
like image 27
LouisChiffre Avatar answered May 03 '26 05:05

LouisChiffre



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!