R: "apply" statement to take the sum of the number of non-NA values across multiple columns

Tags:

I have a large dataframe of doctor visit records. Each record (row) can have up to 11 diagnosis codes. I want to know how many non-NA diagnosis codes are in each row.

Here is a sample of the data:

diag1 diag2 diag3 diag4 diag5 diag6 diag7 diag8 diag9 diag10 diag11
786   272   401   782    250  91912  530    NA    NA    NA     NA   
845   530   338   311    NA    NA    NA     NA    NA    NA     NA

So in these two rows, I would want to know that row 1 had 7 codes and row 2 had 4 codes. The dataframe is 31,596 rows so a loop is taking way too long. I'd like to use an "apply" statement to speed things up:

z = apply(y[,paste("diag", 1:11, sep="")], 1, function(x)sum({any(x[!is.na(x)])}))

R just returns a vector of 1's that is the same length as the number of rows in the dataset. I think something is wrong with using "any"? Does anyone have a good way to count the number of non-NA values across multiple columns? Thanks!

830

asked May 07 '12 17:05

mEvans

1 Answers

Just use is.na and rowSums:

z <- rowSums(!is.na(y[,paste("diag", 1:11, sep="")]))

198

answered Sep 24 '22 03:09

Joshua Ulrich

Related questions
                            
                                Reading a "flipped" table in to a data.frame correctly
                            
                                Adding a Unique Trend Line to a Barplot in GGPLOT2
                            
                                Changing text in a data frame
                            
                                How do I compile a dll with R and RCPP?
                            
                                Color Barplot by Count
                            
                                milliseconds timestamps as keys in data.table
                            
                                how to turn a vector into a set in r
                            
                                Randomly select on Data Frame, for unique rows
                            
                                Associative array from string
                            
                                Is it possible to truncate output when viewing the contents of dataframes?
                            
                                `With` usage inside function (wrapper)
                            
                                Column alignment in xtable output
                            
                                Bootstrap Confidence Intervals in R
                            
                                How do I count the number of observations at given intervals in R?
                            
                                How do I make an array of classes in R?
                            
                                ggplot geom_tile spacing with facets
                            
                                R cleaning up a character and converting it into a numeric
                            
                                Adding points to a geom_tile layer in ggplot2
                            
                                Ignoring values or NAs in the sample function
                            
                                Excel like column operations in R dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R: "apply" statement to take the sum of the number of non-NA values across multiple columns

Tags:

function

r

apply

rows

any

mEvans

People also ask

1 Answers

Joshua Ulrich

Recent Activity

Donate For Us