Handling missing/incomplete data in R--is there function to mask but not remove NAs?

Tags:

As you would expect from a DSL aimed at data analysis, R handles missing/incomplete data very well, for instance:

Many R functions have an na.rm flag that when set to TRUE, remove the NAs:

>>> v = mean( c(5, NA, 6, 12, NA, 87, 9, NA, 43, 67), na.rm=T) >>> v       (5, 6, 12, 87, 9, 43, 67)

But if you want to deal with NAs before the function call, you need to do something like this:

to remove each 'NA' from a vector:

vx = vx[!is.na(a)]

to remove each 'NA' from a vector and replace it w/ a '0':

ifelse(is.na(vx), 0, vx)

to remove entire each row that contains 'NA' from a data frame:

dfx = dfx[complete.cases(dfx),]

All of these functions permanently remove 'NA' or rows with an 'NA' in them.

Sometimes this isn't quite what you want though--making an 'NA'-excised copy of the data frame might be necessary for the next step in the workflow but in subsequent steps you often want those rows back (e.g., to calculate a column-wise statistic for a column that has missing rows caused by a prior call to 'complete cases' yet that column has no 'NA' values in it).

to be as clear as possible about what i'm looking for: python/numpy has a class, masked array, with a mask method, which lets you conceal--but not remove--NAs during a function call. Is there an analogous function in R?

944

asked Apr 10 '10 12:04

doug

1 Answers

Exactly what to do with missing data -- which may be flagged as NA if we know it is missing -- may well differ from domain to domain.

To take an example related to time series, where you may want to skip, or fill, or interpolate, or interpolate differently, ... is that just the (very useful and popular) zoo has all these functions related to NA handling:

zoo::na.approx  zoo::na.locf     zoo::na.spline  zoo::na.trim

allowing to approximate (using different algorithms), carry-forward or backward, use spline interpolation or trim.

Another example would be the numerous missing imputation packages on CRAN -- often providing domain-specific solutions. [ So if you call R a DSL, what is this? "Sub-domain specific solutions for domain specific languages" or SDSSFDSL? Quite a mouthful :) ]

But for your specific question: no, I am not aware of a bit-level flag in base R that allows you to mark observations as 'to be excluded'. I presume most R users would resort to functions like na.omit() et al or use the na.rm=TRUE option you mentioned.

answered Sep 29 '22 06:09

Dirk Eddelbuettel

Related questions
                            
                                Using R to download newest files from ftp-server
                            
                                Extract Column from data.frame as a Vector
                            
                                How to draw lines outside of plot area in ggplot2?
                            
                                Error: Invalid input: date_trans works with objects of class Date only
                            
                                Convert MATLAB code to R [closed]
                            
                                read.table() and read.csv both Error in Rmd
                            
                                Error in Confusion Matrix : the data and reference factors must have the same number of levels
                            
                                Draw a box around a legend ggplot2
                            
                                Formatting a date in R without leading zeros
                            
                                Error in unserialize(socklist[[n]]) : error reading from connection on Unix
                            
                                Meaning of objects being masked by the global environment
                            
                                Variable width bars in ggplot2 barplot in R
                            
                                Colorize parts of the title in a plot
                            
                                Rank variable by group (dplyr)
                            
                                Data input via shinyTable in R shiny application
                            
                                Apply a function to each row in a data frame in R [duplicate]
                            
                                How to use subscripts in ggplot2 legends [R]
                            
                                Using Roxygen2 Template tags
                            
                                data.table join then add columns to existing data.frame without re-copy
                            
                                List files in R that do NOT match a pattern

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Handling missing/incomplete data in R--is there function to mask but not remove NAs?

Tags:

r

missing-data

data-processing

doug

People also ask

1 Answers

Dirk Eddelbuettel

Recent Activity

Donate For Us