I have multiple frames, for the purpose suppose 2. Each frame comprises 2 columns - an index column, and a value column
sz<-5;
frame_1<-data.frame(index=sort(sample(1:10,sz,replace=F)),value=rpois(sz,50));
frame_2<-data.frame(index=sort(sample(1:10,sz,replace=F)),value=rpois(sz,50));
frame_1:
index value
1 49
6 62
7 58
8 30
10 50
frame_2:
index value
4 60
5 64
6 48
7 46
9 57
The goal is to create a third frame, frame_3, whose indices will be the union of those in frame_1 and frame_2,
frame_3<-data.frame(index = sort(union(frame_1$index,frame_2$index)));
and which will comprise two additional columns, value_1 and value_2.
frame_3$value_1 will be filled out from frame_1$value, frame_3$value_2 will be filled out from frame_2$value;
These should be filled out like so: frame_3:
index value_1 value_2
1 49 NA
4 49 60 # value_1 is filled through with previous value
5 49 64 # value_1 is filled through with previous value
6 62 48
7 58 46
8 30 46 # value_2 is filled through with previous value
9 30 57 # value_1 is filled through with previous value
10 50 57 # value_1 is filled through with previous value
i'm looking for an efficient solution, as im dealing with records in the hundreds of thousands
This problem screams for data.table
. You can use a loop to recursively construct columns one by one using x[y, roll=TRUE]
.
require(data.table)
dt1 <- data.table(frame_1)
dt2 <- data.table(frame_2)
setkey(dt1, index)
setkey(dt2, index)
dt3 <- data.table(index = sort(unique(c(dt1$index, dt2$index))))
> dt1[dt2[dt3, roll=TRUE], roll=TRUE]
# index value value.1
# 1: 1 49 NA
# 2: 4 49 60
# 3: 5 49 64
# 4: 6 62 48
# 5: 7 58 46
# 6: 8 30 46
# 7: 9 30 57
# 8: 10 50 57
If your data.frames aren't very large, you can just use merge
combined with zoo::na.locf
.
R> library(zoo)
R> frame_3 <- merge(frame_1, frame_2, by="index",
+ all=TRUE, suffixes=paste(".",1:2,sep=""))
R > (frame_3 <- na.locf(frame_3))
index value.1 value.2
1 1 49 NA
2 4 49 60
3 5 49 64
4 6 62 48
5 7 58 46
6 8 30 46
7 9 30 57
8 10 50 57
Or, just use zoo objects to begin with, assuming your "value" columns are all one type (like a matrix, you can't mix types in zoo objects).
R> z1 <- zoo(frame_1$value, frame_1$index)
R> z2 <- zoo(frame_2$value, frame_2$index)
R> (z3 <- na.locf(merge(z1, z2)))
z1 z2
1 49 NA
4 49 60
5 49 64
6 62 48
7 58 46
8 30 46
9 30 57
10 50 57
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With