Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R date time aligning and fill through values

I have multiple frames, for the purpose suppose 2. Each frame comprises 2 columns - an index column, and a value column

sz<-5;
frame_1<-data.frame(index=sort(sample(1:10,sz,replace=F)),value=rpois(sz,50));
frame_2<-data.frame(index=sort(sample(1:10,sz,replace=F)),value=rpois(sz,50));

frame_1:

 index value
  1    49
  6    62
  7    58
  8    30
 10    50

frame_2:

index value
  4    60
  5    64
  6    48
  7    46
  9    57

The goal is to create a third frame, frame_3, whose indices will be the union of those in frame_1 and frame_2,

frame_3<-data.frame(index = sort(union(frame_1$index,frame_2$index)));

and which will comprise two additional columns, value_1 and value_2.

frame_3$value_1 will be filled out from frame_1$value, frame_3$value_2 will be filled out from frame_2$value;

These should be filled out like so: frame_3:

index value_1 value_2
1      49       NA
4      49       60     # value_1 is filled through with previous value
5      49       64     # value_1 is filled through with previous value
6      62       48     
7      58       46   
8      30       46     # value_2 is filled through with previous value
9      30       57     # value_1 is filled through with previous value
10     50       57     # value_1 is filled through with previous value

i'm looking for an efficient solution, as im dealing with records in the hundreds of thousands

like image 480
Aditya Sihag Avatar asked Feb 01 '13 11:02

Aditya Sihag


2 Answers

This problem screams for data.table. You can use a loop to recursively construct columns one by one using x[y, roll=TRUE].

require(data.table)
dt1 <- data.table(frame_1)
dt2 <- data.table(frame_2)
setkey(dt1, index)
setkey(dt2, index)
dt3 <- data.table(index = sort(unique(c(dt1$index, dt2$index))))
> dt1[dt2[dt3, roll=TRUE], roll=TRUE]

#    index value value.1
# 1:     1    49      NA
# 2:     4    49      60
# 3:     5    49      64
# 4:     6    62      48
# 5:     7    58      46
# 6:     8    30      46
# 7:     9    30      57
# 8:    10    50      57
like image 120
Arun Avatar answered Sep 20 '22 22:09

Arun


If your data.frames aren't very large, you can just use merge combined with zoo::na.locf.

R> library(zoo)
R> frame_3 <- merge(frame_1, frame_2, by="index",
+                  all=TRUE, suffixes=paste(".",1:2,sep=""))
R > (frame_3 <- na.locf(frame_3))
  index value.1 value.2
1     1      49      NA
2     4      49      60
3     5      49      64
4     6      62      48
5     7      58      46
6     8      30      46
7     9      30      57
8    10      50      57

Or, just use zoo objects to begin with, assuming your "value" columns are all one type (like a matrix, you can't mix types in zoo objects).

R> z1 <- zoo(frame_1$value, frame_1$index)
R> z2 <- zoo(frame_2$value, frame_2$index)
R> (z3 <- na.locf(merge(z1, z2)))
   z1 z2
1  49 NA
4  49 60
5  49 64
6  62 48
7  58 46
8  30 46
9  30 57
10 50 57
like image 30
Joshua Ulrich Avatar answered Sep 22 '22 22:09

Joshua Ulrich