Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

subset row in data.table by id

Tags:

r

data.table

I feel there must be a simple data.table solution for this problem. I have the following data:

library(data.table)

data <- data.table(
    id = c(1,1,1,2,2,3),
    income_year0 = c(NA,NA,100,NA,200,NA),
    income_year1 = c(NA, 105, NA, 202,NA, 255),
    income_year2 = c(102, NA,NA,NA,NA,NA)
)

I want to for each unique id to create a new column income that takes the value in income_year0 (if not NA), otherwise the value in income_year1 (if not NA), otherwise the value in income_year2, and if all are NA, then income is NA.

That is, I want one row per id with one income column like so:

data_want <- data.table(
    id = c(1,2,3),
    income = c(100,200,255)
)
like image 651
epiNS Avatar asked Mar 26 '21 08:03

epiNS


2 Answers

You can unlist the columns and select the first non-NA value.

library(data.table)
data[, .(income = na.omit(unlist(.SD))[1]), id]

#   id income
#1:  1    100
#2:  2    200
#3:  3    255
like image 200
Ronak Shah Avatar answered Oct 30 '22 11:10

Ronak Shah


Another option with as.matrix + is.na

> data[, .(income = first(as.matrix(.SD)[!is.na(.SD)])), id]
   id income
1:  1    100
2:  2    200
3:  3    255
like image 20
ThomasIsCoding Avatar answered Oct 30 '22 12:10

ThomasIsCoding