Show columns with NAs in a data.frame

Tags:

I'd like to show the names of columns in a large dataframe that contain missing values. Basically, I want the equivalent of complete.cases(df) but for columns, not rows. Some of the columns are non-numeric, so something like

names(df[is.na(colMeans(df))])

returns "Error in colMeans(df) : 'x' must be numeric." So, my current solution is to transpose the dataframe and run complete.cases, but I'm guessing there's some variant of apply (or something in plyr) that's much more efficient.

nacols <- function(df) {
  names(df[,!complete.cases(t(df))])
} 

w <- c("hello","goodbye","stuff")
x <- c(1,2,3)
y <- c(1,NA,0)
z <- c(1,0, NA)
tmp <- data.frame(w,x,y,z)

nacols(tmp)
[1] "y" "z"

Can someone show me a more efficient function to identify columns that have NAs?

657

asked May 13 '12 18:05

Moira

2 Answers

This is the fastest way that I know of:

unlist(lapply(df, function(x) any(is.na(x))))

EDIT:

I guess everyone else wrote it out complete so here it is complete:

nacols <- function(df) {
    colnames(df)[unlist(lapply(df, function(x) any(is.na(x))))]
}

And if you microbenchmark the 4 solutions on a WIN 7 machine:

Unit: microseconds
    expr     min      lq  median      uq        max
1 ANDRIE  85.380  91.911 106.375 116.639    863.124
2 MANOEL  87.712  93.778 105.908 118.971   8426.886
3  MOIRA 764.215 798.273 817.402 876.188 143039.632
4  TYLER  51.321  57.853  62.518  72.316   1365.136

And here's a visual of that: enter image description here

Edit At the time I wrote this anyNA did not exist or I was unaware of it. This may speed things up moreso...per the help manual for ?anyNA:

The generic function anyNA implements any(is.na(x)) in a possibly faster way (especially for atomic vectors).

nacols <- function(df) {
    colnames(df)[unlist(lapply(df, function(x) anyNA(x)))]
}

186

answered Nov 18 '22 15:11

Tyler Rinker

Here is one way:

colnames(tmp)[colSums(is.na(tmp)) > 0]

Hope it helps,

Manoel

answered Nov 18 '22 13:11

Manoel Galdino

Related questions
                            
                                How to flush data to browser but continue executing
                            
                                see values of chart points when the mouse is on points
                            
                                Meaning of PLSQL symbol "=>"
                            
                                VBA error 1004 - select method of range class failed
                            
                                Fastest way to split a concatenated string into a tuple and ignore empty strings
                            
                                Logarithm function of an arbitrary integer base in C
                            
                                Using Guava for high performance thread-safe caching
                            
                                Emacs interactive regex replacement
                            
                                How to keep one field fixed in a formular when using auto-fill?
                            
                                How to apply filter to specific datatable
                            
                                Why can't errno's value be printed?
                            
                                How to get the most recent file using a batch file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With