I am trying to count the number of columns that do not contain NA for each row, and place that value into a new column for that row. Example data: <pre class="prettyprint"><code>library(data.table) a = c(1,2,3,4,NA) b = c(6,NA,8,9,10) c = c(11,12,NA,14,15) d = data.table(a,b,c) > d a b c 1: 1 6 11 2: 2 NA 12 3: 3 8 NA 4: 4 9 14 5: NA 10 15 </code></pre> My desired output would include a new column <code>num_obs</code> which contains the number of non-NA entries per row: <pre class="prettyprint"><code> a b c num_obs 1: 1 6 11 3 2: 2 NA 12 2 3: 3 8 NA 2 4: 4 9 14 3 5: NA 10 15 2 </code></pre> I've been reading for hours now and so far the best I've come up with is looping over rows, which I know is never advisable in R or data.table. I'm sure there is a better way to do this, please enlighten me. My crappy way: <pre class="prettyprint"><code>len = (1:NROW(d)) for (n in len) { d[n, num_obs := length(which(!is.na(d[n])))] } </code></pre>

Try this one using <code>Reduce</code> to chain together <code>+</code> calls: <pre class="prettyprint"><code>d[, num_obs := Reduce(`+`, lapply(.SD,function(x) !is.na(x)))] </code></pre> If speed is critical, you can eek out a touch more with Ananda's suggestion to hardcode the number of columns being assessed: <pre class="prettyprint"><code>d[, num_obs := 4 - Reduce("+", lapply(.SD, is.na))] </code></pre> Benchmarking using Ananda's larger data.table <code>d</code> from above: <pre class="prettyprint"><code>fun1 <- function(indt) indt[, num_obs := rowSums(!is.na(indt))][] fun3 <- function(indt) indt[, num_obs := Reduce(`+`, lapply(.SD,function(x) !is.na(x)))][] fun4 <- function(indt) indt[, num_obs := 4 - Reduce("+", lapply(.SD, is.na))][] library(microbenchmark) microbenchmark(fun1(copy(d)), fun3(copy(d)), fun4(copy(d)), times=10L) #Unit: milliseconds # expr min lq mean median uq max neval # fun1(copy(d)) 3.565866 3.639361 3.912554 3.703091 4.023724 4.596130 10 # fun3(copy(d)) 2.543878 2.611745 2.973861 2.664550 3.657239 4.011475 10 # fun4(copy(d)) 2.265786 2.293927 2.798597 2.345242 3.385437 4.128339 10 </code></pre>

R: data.table count !NA per row

Tags:

r

data.table

I am trying to count the number of columns that do not contain NA for each row, and place that value into a new column for that row.

Example data:

library(data.table)

a = c(1,2,3,4,NA)
b = c(6,NA,8,9,10)
c = c(11,12,NA,14,15)
d = data.table(a,b,c)

> d 
    a  b  c
1:  1  6 11
2:  2 NA 12
3:  3  8 NA
4:  4  9 14
5: NA 10 15

My desired output would include a new column num_obs which contains the number of non-NA entries per row:

    a  b  c num_obs
1:  1  6 11       3
2:  2 NA 12       2
3:  3  8 NA       2
4:  4  9 14       3
5: NA 10 15       2

I've been reading for hours now and so far the best I've come up with is looping over rows, which I know is never advisable in R or data.table. I'm sure there is a better way to do this, please enlighten me.

My crappy way:

len = (1:NROW(d))
for (n in len) {
  d[n, num_obs := length(which(!is.na(d[n])))]
}

304

asked Feb 10 '16 03:02

Reilstein

1 Answers

Try this one using Reduce to chain together + calls:

d[, num_obs := Reduce(`+`, lapply(.SD,function(x) !is.na(x)))]

If speed is critical, you can eek out a touch more with Ananda's suggestion to hardcode the number of columns being assessed:

d[, num_obs := 4 - Reduce("+", lapply(.SD, is.na))]

Benchmarking using Ananda's larger data.table d from above:

fun1 <- function(indt) indt[, num_obs := rowSums(!is.na(indt))][]
fun3 <- function(indt) indt[, num_obs := Reduce(`+`, lapply(.SD,function(x) !is.na(x)))][]
fun4 <- function(indt) indt[, num_obs := 4 - Reduce("+", lapply(.SD, is.na))][]

library(microbenchmark)
microbenchmark(fun1(copy(d)), fun3(copy(d)), fun4(copy(d)), times=10L)

#Unit: milliseconds
#          expr      min       lq     mean   median       uq      max neval
# fun1(copy(d)) 3.565866 3.639361 3.912554 3.703091 4.023724 4.596130    10
# fun3(copy(d)) 2.543878 2.611745 2.973861 2.664550 3.657239 4.011475    10
# fun4(copy(d)) 2.265786 2.293927 2.798597 2.345242 3.385437 4.128339    10

176

answered Sep 19 '22 16:09

thelatemail

Related questions
                            
                                How can I avoid having my R script printed every time I run it?
                            
                                rowMeans function in dplyr
                            
                                Is `if` faster than ifelse?
                            
                                Are there raw strings in R for regular expressions?
                            
                                Group by columns and summarize a column into a list
                            
                                How to Switch Between NavBar Tabs with a Button R Shiny
                            
                                How can I parse CSV data from a character vector to extract a data frame?
                            
                                How to Parse Year + Week Number in R?
                            
                                Replacing all occurrences of a pattern in a string
                            
                                Argument is of length zero
                            
                                Changing the Color of negative numbers to Red in a table generated with xtable()?
                            
                                heatmap-like plot, but for categorical variables
                            
                                Return the character associated with the specified Ascii code in R
                            
                                Set global thousand separator on knitr
                            
                                Lazy sequences in R
                            
                                Shift values in single column of dataframe up
                            
                                "subset" and "[" on dataframe give slightly different results, why?
                            
                                how to download and display an image from an URL in R?
                            
                                Dataframe within dataframe?
                            
                                How to make part of rmarkdown document without section numbering?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With