I have a data.table
with many missing values and I want a variable which gives me a 1 for the first non-missin value in each group.
Say I have such a data.table:
library(data.table)
DT <- data.table(iris)[,.(Petal.Width,Species)]
DT[c(1:10,15,45:50,51:70,101:134),Petal.Width:=NA]
which now has missings in the beginning, at the end and in between. I have tried two versions, one is:
DT[min(which(!is.na(Petal.Width))),first_available:=1,by=Species]
but it only finds the global minimum (in this case, setosa gets the correct 1), not the minimum by group. I think this is the case because data.table
first subsets by i, then sorts by group, correct? So it will only work with the row that is the global minimum of which(!is.na(Petal.Width))
which is the first non-NA value.
A second attempt with the test in j:
DT[,first_available:= ifelse(min(which(!is.na(Petal.Width))),1,0),by=Species]
which just returns a column of 1s. Here, I don't have a good explanation as to why it doesn't work.
my goal is this:
DT[,first_available:=0]
DT[c(11,71,135),first_available:=1]
but in reality I have hundreds of groups. Any help would be appreciated!
Edit: this question does come close but is not targeted at NA's and does not solve the issue here if I understand it correctly. I tried:
DT <- data.table(DT, key = c('Species'))
DT[unique(DT[,key(DT), with = FALSE]), mult = 'first']
Here's one way:
DT[!is.na(Petal.Width), first := as.integer(seq_len(.N) == 1L), by = Species]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With