I have a matrix in R that I would like to take a single random sample from each row. Some of my data is in NA, but when taking the random sample I do not want the NA to be an option for the sampling. How would I accomplish this?
For example,
a <- matrix (c(rep(5, 10), rep(10, 10), rep(NA, 5)), ncol=5, nrow=5)
a
[,1] [,2] [,3] [,4] [,5]
[1,] 5 5 10 10 NA
[2,] 5 5 10 10 NA
[3,] 5 5 10 10 NA
[4,] 5 5 10 10 NA
[5,] 5 5 10 10 NA
When I apply the sample function to this matrix to output another matrix I get
b <- matrix(apply(a, 1, sample, size=1), ncol=1)
b
[,1]
[1,] NA
[2,] NA
[3,] 10
[4,] 10
[5,] 5
Instead I do not want the NA to be capable of being the output and want the output to be something like:
b
[,1]
[1,] 10
[2,] 10
[3,] 10
[4,] 5
[5,] 10
First, if we want to exclude missing values from mathematical operations use the na. rm = TRUE argument. If you do not exclude these values most functions will return an NA . We may also desire to subset our data to obtain complete observations, those observations (rows) in our data that contain no missing data.
A missing value is one whose value is unknown. Missing values are represented in R by the NA symbol.
You can use the na. omit() function in R to remove any incomplete cases in a vector, matrix, or data frame.
sample function (plural sample functions) (statistics) Any function used to obtain a set of samples from a given population.
There might be a better way but sample doesn't appear to have any parameters related to NAs so instead I just wrote an anonymous function to deal with the NAs.
apply(a, 1, function(x){sample(x[!is.na(x)], size = 1)})
essentially does what you want. If you really want the matrix output you could do
b <- matrix(apply(a, 1, function(x){sample(x[!is.na(x)], size = 1)}), ncol = 1)
Edit: You didn't ask for this but my proposed solution does fail in certain cases (mainly if a row contains ONLY NAs.
a <- matrix (c(rep(5, 10), rep(10, 10), rep(NA, 5)), ncol=5, nrow=5)
# My solution works fine with your example data
apply(a, 1, function(x){sample(x[!is.na(x)], size = 1)})
# What happens if a row contains only NAs
a[1,] <- NA
# Now it doesn't work
apply(a, 1, function(x){sample(x[!is.na(x)], size = 1)})
# We can rewrite the function to deal with that case
mysample <- function(x, ...){
if(all(is.na(x))){
return(NA)
}
return(sample(x[!is.na(x)], ...))
}
# Using the new function things work.
apply(a, 1, mysample, size = 1)
I think @Dason's solution works quite well, but you can also try this:
a <- matrix (c(rep(5, 10), rep(10, 10), rep(NA, 5)), ncol=5, nrow=5)
matrix(sample(na.omit(as.numeric(a)),ncol(a)))
[,1]
[1,] 10
[2,] 5
[3,] 10
[4,] 10
[5,] 5
Even if there is a complete row with NA's or a complete column with NA'S, this solution can deal with perfectly, for instance:
set.seed(007)
a <- matrix(sample(1:100, 25), 5)
a[1,] <- NA
a[5,1] <- NA
a[,3] <- NA
a[5,5] <- NA
a[3,2] <- NA
matrix(sample(na.omit(as.numeric(a)),ncol(a)))
[,1]
[1,] 40
[2,] 1
[3,] 42
[4,] 26
[5,] 32
I guess this is what you were looking for (at least this could be another approach).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With