Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

r count cells with missing values across each row [duplicate]

I have a dataframe as shown below

    Id         Date         Col1       Col2     Col3        Col4
    30         2012-03-31              A42.2    20.46        NA  
    36         1996-11-15   NA                  V73          55
    96         2010-02-07   X48        Z16      13
    40         2010-03-18   AD14                20.12        36
    69         2012-02-21              22.45                     
    11         2013-07-03   81         V017                  TCG11         
    22         2001-06-01                       67
    83         2005-03-16   80.45      V22.15   46.52        X29.11 
    92         2012-02-12   
    34         2014-03-10   82.12      N72.22   V45.44

I am trying to count the number of NA or Empty cells across each row and the final expected output is as follows

    Id         Date         Col1       Col2     Col3        Col4       MissCount
    30         2012-03-31              A42.2    20.46        NA        2
    36         1996-11-15   NA                  V73          55        2
    96         2010-02-07   X48        Z16      13                     1
    40         2010-03-18   AD14                20.12        36        1
    69         2012-02-21              22.45                           3
    11         2013-07-03   81         V017                  TCG11     1    
    22         2001-06-01                       67                     3
    83         2005-03-16   80.45      V22.15   46.52        X29.11    0
    92         2012-02-12                                              4   
    34         2014-03-10   82.12      N72.22   V45.44                 1

The last column MissCount will store the number of NAs or empty cells for each row. Any help is much appreciated.

like image 593
Emily Fassbender Avatar asked Dec 02 '25 09:12

Emily Fassbender


1 Answers

The one-liner

rowSums(is.na(df) | df == "")

given by @DavidArenburg in his comment is definitely the way to go, assuming that you don't mind checking every column in the data frame. If you really only want to check Col1 through Col4, then using an apply function might make more sense.

apply(df, 1, function(x) {
                sum(is.na(x[c("Col1", "Col2", "Col3", "Col4")])) +
                sum(x[c("Col1", "Col2", "Col3", "Col4")] == "", na.rm=TRUE)
             })

Edit: Shortened code

apply(df[c("Col1", "Col2", "Col3", "Col4")], 1, function(x) {
                    sum(is.na(x)) +
                    sum(x == "", na.rm=TRUE)
                 })

or if data columns are exactly like the example data:

apply(df[3:6], 1, function(x) {
                        sum(is.na(x)) +
                        sum(x == "", na.rm=TRUE)
                     })
like image 62
Tim Biegeleisen Avatar answered Dec 03 '25 22:12

Tim Biegeleisen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!