Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

add missing rows to a data table

Tags:

r

data.table

I have a data table:

library(data.table)
(f <- data.table(id1=c(1,2,3,1,2,3),
                 id2=as.factor(c("a","a","b","c","b","d")),
                 v=1:6,
                 key=c("id1","id2")))
   id1 id2 v
1:   1   a 1
2:   1   c 4
3:   2   a 2
4:   2   b 5
5:   3   b 3
6:   3   d 6
> str(f)
Classes ‘data.table’ and 'data.frame':  6 obs. of  3 variables:
 $ id1: num  1 1 2 2 3 3
 $ id2: Factor w/ 4 levels "a","b","c","d": 1 3 1 2 2 4
 $ v  : int  1 4 2 5 3 6
 - attr(*, "sorted")= chr  "id1" "id2"
 - attr(*, ".internal.selfref")=<externalptr> 

How do I add the "missing" rows?

I.e., for each existing id1 I want all possible values of id2 to be present (with v=0).

So, I need to add 6 rows (3 possible values of id1 * 4 possible values of id2 - 6 existing rows).

like image 633
sds Avatar asked Mar 17 '14 18:03

sds


2 Answers

I'd get the unique values in id1 and id2 and do a join using data.table's cross join function CJ as follows:

# if you've already set the key:
ans <- f[CJ(id1, id2, unique=TRUE)][is.na(v), v := 0L][]

# or, if f is not keyed:
ans <- f[CJ(id1 = id1, id2 = id2, unique=TRUE), on=.(id1, id2)][is.na(v), v := 0L][]

ans
like image 184
Arun Avatar answered Nov 08 '22 09:11

Arun


f[, {
  tab = table(id2)
  x = as.numeric(tab)
  x[x != 0] = v
  list(id2 = names(tab), v = x)
}, by = id1]

##    id1 id2 v
##  1:   1   a 1
##  2:   1   b 0
##  3:   1   c 4
##  4:   1   d 0
##  5:   2   a 2
##  6:   2   b 5
##  7:   2   c 0
##  8:   2   d 0
##  9:   3   a 0
## 10:   3   b 3
## 11:   3   c 0
## 12:   3   d 6
like image 30
Jake Burkhead Avatar answered Nov 08 '22 09:11

Jake Burkhead