I am trying to recode <code>NA</code> values to <code>0</code> in a subset of columns using the following dataset: <pre class="prettyprint"><code>set.seed(1) df <- data.frame( id = c(1:10), trials = sample(1:3, 10, replace = T), t1 = c(sample(c(1:9, NA), 10)), t2 = c(sample(c(1:7, rep(NA, 3)), 10)), t3 = c(sample(c(1:5, rep(NA, 5)), 10)) ) </code></pre> Each row has a certain number of trials associated with it (between 1-3), specified by the <code>trials</code> column. columns <code>t1-t3</code> represent scores for each trial. The number of trials indicates the subset of columns in which <code>NA</code>s should be recoded to <code>0</code>: <code>NA</code>s that are within the number of trials represent missing data, and should be recoded as <code>0</code>, while <code>NA</code>s outside the number of trials are not meaningful, and should remain <code>NA</code>s. So, for a row where <code>trials == 3</code>, an <code>NA</code> in column <code>t3</code> would be recoded as <code>0</code>, but in a row where <code>trials == 2</code>, an <code>NA</code> in <code>t3</code> would remain an <code>NA</code>. So, I tried using this function: <pre class="prettyprint"><code>replace0 <- function(x, num.sun) { x[which(is.na(x[1:(num.sun + 2)]))] <- 0 return(x) } </code></pre> This works well for single vectors. When I try applying the same function to a data frame with <code>apply()</code>, though: <pre class="prettyprint"><code>apply(df, 1, replace0, num.sun = df$trials) </code></pre> I get a warning saying: <pre class="prettyprint"><code>In 1:(num.sun + 2) : numerical expression has 10 elements: only the first used </code></pre> The result is that instead of having the value of <code>num.sun</code> change every row according to the value in <code>trials</code>, <code>apply()</code> simply uses the first value in the <code>trials</code> column for every single row. How could I apply the function so that the <code>num.sun</code> argument changes according to the value of <code>df$trials</code>? Thanks! Edit: as some have commented, the original example data had some non-NA scores that didn't make sense according to the trials column. Here's a corrected dataset: <pre class="prettyprint"><code>df <- data.frame( id = c(1:5), trials = c(rep(1, 2), rep(2, 1), rep(3, 2)), t1 = c(NA, 7, NA, 6, NA), t2 = c(NA, NA, 3, 7, 12), t3 = c(NA, NA, NA, 4, NA) ) </code></pre>

Another approach: <pre class="prettyprint"><code># create an index of the NA values w <- which(is.na(df), arr.ind = TRUE) # create an index with the max column by row where an NA is allowed to be replaced by a zero m <- matrix(c(1:nrow(df), (df$trials + 2)), ncol = 2) # subset 'w' such that only the NA's which fall in the scope of 'm' remain i <- w[w[,2] <= m[,2][match(w[,1], m[,1])],] # use 'i' to replace the allowed NA's with a zero df[i] <- 0 </code></pre> which gives: <blockquote> <pre class="prettyprint"><code>> df id trials t1 t2 t3 1 1 1 3 NA 5 2 2 2 2 2 NA 3 3 2 6 6 4 4 4 3 0 1 2 5 5 1 5 NA NA 6 6 3 7 0 0 7 7 3 8 7 0 8 8 2 4 5 1 9 9 2 1 3 NA 10 10 1 9 4 3 </code></pre> </blockquote> You could easily wrap this in a function: <pre class="prettyprint"><code>replace.NA.with.0 <- function(df) { w <- which(is.na(df), arr.ind = TRUE) m <- matrix(c(1:nrow(df), (df$trials + 2)), ncol = 2) i <- w[w[,2] <= m[,2][match(w[,1], m[,1])],] df[i] <- 0 return(df) } </code></pre> Now, using <code>replace.NA.with.0(df)</code> will produce the above result. <hr> As noted by others, some rows (1, 3 & 10) have more values than trails. You could tackle that problem by rewriting the above function to: <pre class="prettyprint"><code>replace.with.NA.or.0 <- function(df) { w <- which(is.na(df), arr.ind = TRUE) df[w] <- 0 v <- tapply(m[,2], m[,1], FUN = function(x) tail(x:5,-1)) ina <- matrix(as.integer(unlist(stack(v)[2:1])), ncol = 2) df[ina] <- NA return(df) } </code></pre> Now, using <code>replace.with.NA.or.0(df)</code> produces the following result: <blockquote> <pre class="prettyprint"><code> id trials t1 t2 t3 1 1 1 3 NA NA 2 2 2 2 2 NA 3 3 2 6 6 NA 4 4 3 0 1 2 5 5 1 5 NA NA 6 6 3 7 0 0 7 7 3 8 7 0 8 8 2 4 5 NA 9 9 2 1 3 NA 10 10 1 9 NA NA </code></pre> </blockquote>

Here I just rewrite your function using double subsetting <code>x[paste0('t',x['trials'])]</code>, which overcome the problem in the other two solutions with row 6 <pre class="prettyprint"><code>replace0 <- function(x){ #browser() x_na <- x[paste0('t',x['trials'])] if(is.na(x_na)){x[paste0('t',x['trials'])] <- 0} return(x) } t(apply(df, 1, replace0)) id trials t1 t2 t3 [1,] 1 1 3 NA 5 [2,] 2 2 2 2 NA [3,] 3 2 6 6 4 [4,] 4 3 NA 1 2 [5,] 5 1 5 NA NA [6,] 6 3 7 NA 0 [7,] 7 3 8 7 0 [8,] 8 2 4 5 1 [9,] 9 2 1 3 NA [10,] 10 1 9 4 3 </code></pre>

Applying custom function to each row uses only first value of argument

Tags:

I am trying to recode NA values to 0 in a subset of columns using the following dataset:

set.seed(1)
df <- data.frame(
  id = c(1:10),
  trials = sample(1:3, 10, replace = T),
  t1 = c(sample(c(1:9, NA), 10)),
  t2 = c(sample(c(1:7, rep(NA, 3)), 10)),
  t3 = c(sample(c(1:5, rep(NA, 5)), 10))
  )

Each row has a certain number of trials associated with it (between 1-3), specified by the trials column. columns t1-t3 represent scores for each trial.

The number of trials indicates the subset of columns in which NAs should be recoded to 0: NAs that are within the number of trials represent missing data, and should be recoded as 0, while NAs outside the number of trials are not meaningful, and should remain NAs. So, for a row where trials == 3, an NA in column t3 would be recoded as 0, but in a row where trials == 2, an NA in t3 would remain an NA.

So, I tried using this function:

replace0 <- function(x, num.sun) {
  x[which(is.na(x[1:(num.sun + 2)]))] <- 0
  return(x)
}

This works well for single vectors. When I try applying the same function to a data frame with apply(), though:

apply(df, 1, replace0, num.sun = df$trials)

I get a warning saying:

In 1:(num.sun + 2) :
  numerical expression has 10 elements: only the first used

The result is that instead of having the value of num.sun change every row according to the value in trials, apply() simply uses the first value in the trials column for every single row. How could I apply the function so that the num.sun argument changes according to the value of df$trials?

Thanks!

Edit: as some have commented, the original example data had some non-NA scores that didn't make sense according to the trials column. Here's a corrected dataset:

df <- data.frame(
  id = c(1:5),
  trials = c(rep(1, 2), rep(2, 1), rep(3, 2)),
  t1 = c(NA, 7, NA, 6, NA),
  t2 = c(NA, NA, 3, 7, 12),
  t3 = c(NA, NA, NA, 4, NA)
)

642

asked Sep 02 '18 07:09

Aziggy

2 Answers

Another approach:

# create an index of the NA values
w <- which(is.na(df), arr.ind = TRUE)

# create an index with the max column by row where an NA is allowed to be replaced by a zero
m <- matrix(c(1:nrow(df), (df$trials + 2)), ncol = 2)

# subset 'w' such that only the NA's which fall in the scope of 'm' remain
i <- w[w[,2] <= m[,2][match(w[,1], m[,1])],]

# use 'i' to replace the allowed NA's with a zero
df[i] <- 0

which gives:

> df
   id trials t1 t2 t3
1   1      1  3 NA  5
2   2      2  2  2 NA
3   3      2  6  6  4
4   4      3  0  1  2
5   5      1  5 NA NA
6   6      3  7  0  0
7   7      3  8  7  0
8   8      2  4  5  1
9   9      2  1  3 NA
10 10      1  9  4  3

You could easily wrap this in a function:

replace.NA.with.0 <- function(df) {
  w <- which(is.na(df), arr.ind = TRUE)
  m <- matrix(c(1:nrow(df), (df$trials + 2)), ncol = 2)
  i <- w[w[,2] <= m[,2][match(w[,1], m[,1])],]
  df[i] <- 0
  return(df)
}

Now, using replace.NA.with.0(df) will produce the above result.

As noted by others, some rows (1, 3 & 10) have more values than trails. You could tackle that problem by rewriting the above function to:

replace.with.NA.or.0 <- function(df) {
  w <- which(is.na(df), arr.ind = TRUE)
  df[w] <- 0

  v <- tapply(m[,2], m[,1], FUN = function(x) tail(x:5,-1))
  ina <- matrix(as.integer(unlist(stack(v)[2:1])), ncol = 2)
  df[ina] <- NA

  return(df)
}

Now, using replace.with.NA.or.0(df) produces the following result:

   id trials t1 t2 t3
1   1      1  3 NA NA
2   2      2  2  2 NA
3   3      2  6  6 NA
4   4      3  0  1  2
5   5      1  5 NA NA
6   6      3  7  0  0
7   7      3  8  7  0
8   8      2  4  5 NA
9   9      2  1  3 NA
10 10      1  9 NA NA

158

answered Sep 28 '22 16:09

Jaap

Here I just rewrite your function using double subsetting x[paste0('t',x['trials'])], which overcome the problem in the other two solutions with row 6

replace0 <- function(x){
         #browser()
         x_na <- x[paste0('t',x['trials'])]
         if(is.na(x_na)){x[paste0('t',x['trials'])] <- 0}
     return(x)
}

t(apply(df, 1, replace0))

     id trials t1 t2 t3
[1,]  1      1  3 NA  5
[2,]  2      2  2  2 NA
[3,]  3      2  6  6  4
[4,]  4      3 NA  1  2
[5,]  5      1  5 NA NA
[6,]  6      3  7 NA  0
[7,]  7      3  8  7  0
[8,]  8      2  4  5  1
[9,]  9      2  1  3 NA
[10,] 10      1  9  4  3

answered Sep 28 '22 17:09

A. Suliman

Related questions
                            
                                Error: failed to create deliver client: orderer client failed to connect to orderer: failed to create new connection: context deadline exceeded
                            
                                spyder is showing error after update [you have missing dependencies]
                            
                                How to add webhooks in gitlab for multibranch pipeline jenkins
                            
                                Can I write a file to a linux directory specifying file permissions in cfscript?
                            
                                What does startAngle mean in an HTML5 canvas ellipse?
                            
                                How to exclude null values from Mongoose populate query
                            
                                Unable to parse a valid JSON
                            
                                How do I pass arguments to custom static type hints in Python 3?
                            
                                Several joins in query - possible to replacement to gain performance?
                            
                                Cascade index.php in nginx "try_files"
                            
                                Is there a way to know maximally reached JVM call stack depth for a particular program run?
                            
                                What's the correct way to implement a metaclass with a different signature than `type`?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With