Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get value of last non-NA row per column in data.table

Tags:

r

data.table

I have a datatable where each column represents a time series, and I want to grab the last NA value per time series in a column-ordered manner. In my particular use case my data looks like this:

a   b     c
1   2     5
1   -17   9
NA  11    4
NA  57    NA
63  NA    NA

So out of this I would like to extract:

a   b    c
63  57   4

How can I accomplish this? So far I only see answers addressing the converse situation of extracting the last non-NA per row rather than per column.

like image 945
helloB Avatar asked Oct 08 '16 15:10

helloB


2 Answers

For someone wishing to use only base R.

sapply(df, function(x) x[max(which(!is.na(x)))])

where

df <- data.frame(a = c(1, 1, NA, NA, 63),
                 b = c(2, -17, 11, 57, NA),
                 c = c(5, 9, 4, NA, NA))
like image 52
sindri_baldur Avatar answered Oct 19 '22 11:10

sindri_baldur


If the dataset is data.table, loop through the Subset of Data.table (.SD), subset the non-NA element (x[!is.na(x)]) and extract the last element among those with tail.

df1[, lapply(.SD, function(x) tail(x[!is.na(x)],1))]
#   a  b c
#1: 63 57 4
like image 40
akrun Avatar answered Oct 19 '22 09:10

akrun