Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: find first non-NA observation in data.table column by group

Tags:

r

data.table

I have a data.table with many missing values and I want a variable which gives me a 1 for the first non-missin value in each group.

Say I have such a data.table:

library(data.table)
DT <- data.table(iris)[,.(Petal.Width,Species)]
DT[c(1:10,15,45:50,51:70,101:134),Petal.Width:=NA]

which now has missings in the beginning, at the end and in between. I have tried two versions, one is:

DT[min(which(!is.na(Petal.Width))),first_available:=1,by=Species]

but it only finds the global minimum (in this case, setosa gets the correct 1), not the minimum by group. I think this is the case because data.table first subsets by i, then sorts by group, correct? So it will only work with the row that is the global minimum of which(!is.na(Petal.Width)) which is the first non-NA value.

A second attempt with the test in j:

DT[,first_available:= ifelse(min(which(!is.na(Petal.Width))),1,0),by=Species]

which just returns a column of 1s. Here, I don't have a good explanation as to why it doesn't work.

my goal is this:

DT[,first_available:=0]
DT[c(11,71,135),first_available:=1]

but in reality I have hundreds of groups. Any help would be appreciated!

Edit: this question does come close but is not targeted at NA's and does not solve the issue here if I understand it correctly. I tried:

DT <- data.table(DT, key = c('Species'))
DT[unique(DT[,key(DT), with = FALSE]), mult = 'first']
like image 294
Jakob Avatar asked Jun 09 '16 10:06

Jakob


1 Answers

Here's one way:

DT[!is.na(Petal.Width), first := as.integer(seq_len(.N) == 1L), by = Species]
like image 154
Arun Avatar answered Nov 15 '22 00:11

Arun