HI All, I'm new to R.
I have two panel data files, with columns "id", "date" and "ret"
file A has a lot more data than file B, but i'm primarily working with file B data.
Combination of "id" and "date" is unqiue indentifier.
Is there an elegent way of looking up for each (id, date) in B, I need to get the past 10 days ret from file A, and store them back into B?
my naive way of doing it is to loop for all rows in B,
for i in 1:length(B) {
B$past10d[i] <- prod(1+A$ret[which(A$id == B$id[i] & A$date > B$date[i]-10 & A$date < B$date[i])])-1
}
but the loops takes forever.
Really appreciate your thoughts.
Thank you very much.
Did you try ?merge ?
"Merge two data frames by common columns or row names, or do other versions of database join operations. "
Besides I suggest to use a little local MySQL / PostgreSQL (RMySQL / RPostgreSQL) database if you continously sport composite PKs or whatsoever as unique identifiers. To me SQL rearranging of data and afterwards using data.frames from view is a lot easier than looping.
I think the key is to vectorize and use the %in%
operator to subset data frame A
. And, I know, prices are not random numbers, but I didn't want to code a random walk... I created a stock-date index using paste
, but I'm sure you could use the index from pdata.frame
in the plm
library, which is the best I've found for panel data.
A <- data.frame(stock=rep(1:10, each=100), date=rep(Sys.Date()-99:0, 10), price=rnorm(1000))
B <- A[seq(from=100, to=1000, by=100), ]
A <- cbind(paste(A$stock, A$date, sep="-"), A)
B <- cbind(paste(B$stock, B$date, sep="-"), B)
colnames(A) <- colnames(B) <- c("index", "stock", "date", "price")
index <- which(A[, 1] %in% B[, 1])
returns <- (A$price[index] - A$price[index-10]) / A$price[index-10]
B <- cbind(B, returns)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With