Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count NA in given columns by rows

Tags:

r

dplyr

I would like to count NA in selected columns by rows and save result in new column. I would like to achieve this with mutate() function from dplyr

How it should work:

loop for each row i in test{
test$SUM <-sum(is.na(test[i,1:2]))
test$SUM2 <-sum(is.na(test[i,3:4]))
test$SUM3 <-sum(is.na(test[i,5:6]))
}

Data used:

test<-data.frame(
BIEZ_01 = c(59000, 61462, NA, 33000, 30840, 36612), 
BIEZ_02 = c(5060, 55401, 33000, 33000, 30840, 28884), 
BIEZ_03 = c(NA, 60783, 20000, 20000, NA, 19248), 
BIEZ_04 = c(22100, 59885, 15000, 15000, 20840, 10000), 
BIEZ_05 = c(NA, 59209, 15000, 15000, 20840, NA), 
BIEZ_06 = c(4400, 6109, NA, 500, 10840, 10000))
like image 629
Piotr Avatar asked Oct 16 '25 15:10

Piotr


2 Answers

Another option

NA.counts <- sapply(split(seq(ncol(test)), ceiling(seq(ncol(test))/2))
                    , function(x) rowSums(is.na(test[, x])))

If you want to use tidyverse to add columns you can do

library(tidyverse)
test %>% 
  cbind(NA.counts = map(seq(ncol(test)) %>% split(ceiling(./2))
                        , ~rowSums(is.na(test[, .]))))


#   BIEZ_01 BIEZ_02 BIEZ_03 BIEZ_04 BIEZ_05 BIEZ_06 NA.counts.1 NA.counts.2 NA.counts.3
# 1   59000    5060      NA   22100      NA    4400           0           1           1
# 2   61462   55401   60783   59885   59209    6109           0           0           0
# 3      NA   33000   20000   15000   15000      NA           1           0           1
# 4   33000   33000   20000   15000   15000     500           0           0           0
# 5   30840   30840      NA   20840   20840   10840           0           1           0
# 6   36612   28884   19248   10000      NA   10000           0           0           1

As @Moody_Mudskipper points out, cbind isn't necessary if you want to modify the dataframe. You can add the columns with

test[paste0("SUM",seq(ncol(test)/2))] <- map(seq(ncol(test)) %>% split(ceiling(./2)), 
                                             ~rowSums(is.na(test[.])))
like image 189
IceCreamToucan Avatar answered Oct 18 '25 08:10

IceCreamToucan


Here is a solution using apply function:

NA_counts <- apply(test,1,function(x){
  c(SUM1=sum(is.na(x[c(1,2)])),SUM2=sum(is.na(x[c(3,4)])),SUM3=sum(is.na(x[c(5,6)])))
  })
cbind(test,t(NA_counts))
like image 34
tyumru Avatar answered Oct 18 '25 07:10

tyumru