Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove constant columns with or without NAs

Tags:

r

data.table

I am trying to get many lm models work in a function and I need to automatically drop constant columns from my data.table. Thus, I want to keep only columns with two or more unique values, excluding NA from the count.

I tried several methods found on SO, but I am still not able to drop columns that have two values: a constant and NAs.

My reproducible code:

library(data.table)
df <- data.table(x=c(1,2,3,NA,5), y=c(1,1,NA,NA,NA),z=c(NA,NA,NA,NA,NA), 
d=c(2,2,2,2,2))

> df
    x  y  z d
1:  1  1 NA 2
2:  2  1 NA 2
3:  3 NA NA 2
4: NA NA NA 2
5:  5 NA NA 2

My intention is to drop columns y, z, and d since they are constant, including y that only have one unique value when NAs are omitted.

I tried this:

same <- sapply(df, function(.col){ all(is.na(.col))  || all(.col[1L] == .col)})
df1 <- df[ , !same, with = FALSE]


> df1
    x  y
1:  1  1
2:  2  1
3:  3 NA
4: NA NA
5:  5 NA

As seen, 'y' is still there ... Any help?

like image 702
COLO Avatar asked Jan 14 '18 20:01

COLO


People also ask

How do I delete a constant column in pandas?

Let's create new DataFrame with non-constant value columns. You can also remove columns using Pandas' df. drop().


1 Answers

Because you have a data.table, you may use uniqueN and its na.rm argument:

df[ , lapply(.SD, function(v) if(uniqueN(v, na.rm = TRUE) > 1) v)]
#     x
# 1:  1
# 2:  2
# 3:  3
# 4: NA
# 5:  5

A base alternative could be Filter(function(x) length(unique(x[!is.na(x)])) > 1, df)

like image 160
Henrik Avatar answered Nov 21 '22 04:11

Henrik