Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get na.omit with data.table to only omit NAs in each column

Tags:

r

data.table

Let's say I have

az<-data.table(a=1:6,b=6:1,c=4)
az[b==4,c:=NA]
az
   a b  c
1: 1 6  4
2: 2 5  4
3: 3 4 NA
4: 4 3  4
5: 5 2  4
6: 6 1  4

I can get the sum of all the columns with

az[,lapply(.SD,sum)]
    a  b  c
1: 21 21 NA

This is what I want for a and b but c is NA. This is seemingly easy enough to fix by doing

az[,lapply(na.omit(.SD),sum)]
    a  b  c
1: 18 17 20

This is what I want for c but I didn't want to omit the values of a and b where c is NA. This is a contrived example in my real data there could be 1000+ columns with random NAs throughout. Is there a way to get na.omit or something else to act per column instead of on the whole table without relying on looping through each column as a vector?

like image 231
Dean MacGregor Avatar asked May 29 '13 18:05

Dean MacGregor


1 Answers

Expanding on my comment:

Many base functions allow you to decide how to treat NA. For example, sum has the argument na.rm:

az[,lapply(.SD,sum,na.rm=TRUE)]

In general, you can also use the function na.omit on each vector individually:

az[,lapply(.SD,function(x) sum(na.omit(x)))]
like image 178
Blue Magister Avatar answered Nov 15 '22 08:11

Blue Magister