In R, can I make the table() function return the number of NA values in a named element?

Tags:

I am using R to summarize a large amount of data for a report. I want to be able to use lapply() to generate a list of tables from the table() function, from which I can extract my desired statistics. There are a lot of these, so I've written a function to do it. My issue is that I am having difficulty returning the number of missing (NA) values even though I have that in each table, because I can't figure out how to tell R that I want the element from table() that holds the number of NA values. As far as I can tell, R is "naming" that element NA...and I can't call that.

I'm trying to avoid writing some complex statement where I say something like which(is.na(names(element[1]))) | names(element[1])=="var_I_want" because I feel like that's just really wordy. I was hoping there was some way to either tell R to label the NA variable in each table with a character name, or to tell it to pick the one labeled NA, but I haven't had much luck yet.

Minimal example:

example <- data.frame(ID=c(10,20,30,40,50),
                      V1=c("A","B","A",NA,"C"),
                      V2=c("Dog","Cat",NA,"Cat","Bunny"),
                      V3=c("Yes","No","No","Yes","No"),
                      V4=c("No",NA,"No","No","Yes"),
                      V5=c("No","Yes","Yes",NA,"No"))

varlist <- c("V1","V2","V3","V4","V5")

list_o_tables <- lapply(X=example[varlist],FUN=table,useNA="always")

list(V1=list_o_tables[["V1"]]["A"],
     V2=list_o_tables[["V2"]]["Cat"],
     V3=list_o_tables[["V3"]]["Yes"],
     V4=list_o_tables[["V4"]]["Yes"],
     V5=list_o_tables[["V5"]]["Yes"])

What I get:

$V1
A 
2 

$V2
Cat 
  2 

$V3
Yes 
  2 

$V4
Yes 
  1 

$V5
Yes 
  2

What I'd like:

$V1
A     <NA>
2       1

$V2
Cat   <NA>
  2     1

$V3
Yes   <NA> 
  2     0

$V4
Yes   <NA> 
  1     1

$V5
Yes   <NA> 
  2     1

242

asked Dec 06 '13 22:12

TARehman

1 Answers

This is ugly (IMHO) but it works:

my_table <- function(x){
    setNames(table(x,useNA = "always"),c(sort(unique(x[!is.na(x)])),'NA'))
}

So you'd lapply this instead, and then you'd have access to the NA column.

Looking more closely, this is rooted in the behavior of factor:

levels(factor(c(1,NA,2),exclude = NULL))
[1] "1" "2" NA

My recollection is that the distinction between a factor level of NA versus "NA" has been at the very least a source of confusion in R in the past. I feel like I've seen some debates about the merits of this on r-devel, but I can't recall for sure at the moment.

So the issue is, if you have a factor with NA values, what do you call the levels? Technically, this is correct, one of the levels is "missing" not literally "NA". It would be nice (IMHO) if table didn't adhere to this quite so strictly, though.

answered Nov 11 '22 12:11

joran

Related questions
                            
                                rbindfill like merge of list of vectors
                            
                                how to dynamically call a variable?
                            
                                xtable and header alignment
                            
                                Calculate the ratio between columns based on the condition in R
                            
                                Is it possible to define object classes that have own methods in R [closed]
                            
                                Split column by last word in sentence
                            
                                R equivalent of Stata local or global macros
                            
                                Why R code with `{}` is faster than that with `()`?
                            
                                how to simulate correlated binary data with R? [duplicate]
                            
                                How to find offset diagonal of a matrix?
                            
                                How to put an apply equivalent to any for loop
                            
                                Error when using %dopar% instead of %do% in R (package doParallel)
                            
                                Aggregate15 minute data to hourly
                            
                                zoo column name for single column object
                            
                                Evaluate many functions using one data in R
                            
                                How to simulate an AR(1) process with arima.sim and an estimated model?
                            
                                Extracting a specific word using gsub and regex
                            
                                "invalid argument type" error with all.equal. R
                            
                                Equivalent of boxplot lwd parameter for bwplot
                            
                                How to connect points of different groups by a line using ggplot

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With