Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: "apply" statement to take the sum of the number of non-NA values across multiple columns

I have a large dataframe of doctor visit records. Each record (row) can have up to 11 diagnosis codes. I want to know how many non-NA diagnosis codes are in each row.

Here is a sample of the data:

diag1 diag2 diag3 diag4 diag5 diag6 diag7 diag8 diag9 diag10 diag11
786   272   401   782    250  91912  530    NA    NA    NA     NA   
845   530   338   311    NA    NA    NA     NA    NA    NA     NA

So in these two rows, I would want to know that row 1 had 7 codes and row 2 had 4 codes. The dataframe is 31,596 rows so a loop is taking way too long. I'd like to use an "apply" statement to speed things up:

z = apply(y[,paste("diag", 1:11, sep="")], 1, function(x)sum({any(x[!is.na(x)])}))

R just returns a vector of 1's that is the same length as the number of rows in the dataset. I think something is wrong with using "any"? Does anyone have a good way to count the number of non-NA values across multiple columns? Thanks!

like image 830
mEvans Avatar asked May 07 '12 17:05

mEvans


People also ask

How do I sum a column in R ignore na?

To find the sum of non-missing values in an R data frame column, we can simply use sum function and set the na. rm to TRUE. For example, if we have a data frame called df that contains a column say x which has some missing values then the sum of the non-missing values can be found by using the command sum(df$x,na.

How do you sum missing values in R?

R automatically converts logical vectors to integer vectors when using arithmetic functions. In the process TRUE gets turned to 1 and FALSE gets converted to 0 . Thus, sum(is.na(x)) gives you the total number of missing values in x .


1 Answers

Just use is.na and rowSums:

z <- rowSums(!is.na(y[,paste("diag", 1:11, sep="")]))
like image 198
Joshua Ulrich Avatar answered Sep 24 '22 03:09

Joshua Ulrich