Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R: How to replace NA in a Vector found between two integers

Tags:

r

vector

I have the following vector:

A:(NA NA NA NA 1 NA NA 4 NA NA 1 NA NA NA NA NA 4 NA 1 NA 4)

I would like to replace all the Nas between 1 and 4 with 2 (but not the Nas between 4 and 1)

Are there any approaches you would recommend/use for this task?

It may also be managed as a dataframe:

 A 
----
 NA 
 NA 
 NA 
 NA 
 1 
 NA 
 NA 
 4 
 NA 
 NA 
 1 
 NA 
 NA 
 NA 
 NA 
 NA
 4 
 NA 
 1
 NA 
 4
----

Edit: 1. I changed the string "Na" to NA.

SOLUTION/UPDATE Thank you to everyone for your insights. I learnt from them to come up with the following solution to my case. I hope it is useful to someone else:

A <- c(df$A)

index.1<-which(df$A %in% c(1)) # define location for 1s in A
index.14<-which(df$A %in% c(1,4)) # define location for 1s and 4s in A

loc.1<-which(index.14 %in% index.1) # location of 1s in  index.14
loc.4<-loc.1+1 # location of 4s relative to 1s in index.14

start.i<-((index.14[loc.1])+1) # starting index for replacing with 2
end.i<-((index.14[loc.4])-1) # ending index for replacing with 2 in index

fill.v<-sort(c(start.i, end.i))# sequence of indexes to fill-in with # 2

# create matrix of beginning and ending sequence
fill.m<-matrix(fill.v,nrow = (length(fill.v)/2),ncol = 2, byrow=TRUE) 

# create a list with indexes to replace
list.1<-apply(fill.m, MARGIN=1,FUN=function(x) seq(x[1],x[2])) 

# unlist list to use as the indexes for replacement
list.2<-unlist(list.1) 

df$A[list.2] <- 2 # replace indexed location with 2
like image 762
Anthony O'Brien Avatar asked Mar 04 '19 15:03

Anthony O'Brien


People also ask

How to replace Na with 0 in an R vector?

Insert Zeros for NA Values in an R Vector (or Column) As you have seen in the previous examples, R replaces NA with 0 in multiple columns with only one line of code. However, we need to replace only a vector or a single column of our database. Let’s find out how this works. First, create some example vector with missing values.

How to replace Na values in vector with Nonna values in Python?

library ("dplyr") df <- tibble (x = c (11, 21, NA), y = c ("x", NA, "y")) print (df) cat ("After replacing NAs", " ") df %>% tidyr::replace_na (list (x = "NonNA", y = "NonNA")) As you can see that we have replaced NA values with NonNA. You can use the replace_na () function to replace NA values in Vector.

What is an example of a vector with an NA value?

Sometimes we have vectors with NA values, also there might be a situation that one of vector having an NA at a position and the other vector has the numerical values at the same position. For example, 1, 2, NA and 1, 2, 3.

What is the difference between replace and replace_Na in Python?

replace_na (data, replace, ...) data: It is a data frame or Vector. replace: If the data is a Vector, the replace takes a single value. If the data is a data frame, the replace takes a list of values, with one value for each column that has NA values to be replaced. If the input data is a data frame, the replace_na () method returns a data frame.


Video Answer


2 Answers

Assuming A is as shown reproducibly in the Note at the end, the difference of cumsum's shown gives TRUE for the elements between 1 and 4 inclusive and the next condition eliminates the endpoints. Finally we replace the positions having TRUE in what is left with 2.

replace(A, (cumsum(A == 1) - cumsum(A == 4)) & (A == "Na"), 2)

giving:

 [1] "Na" "Na" "Na" "Na" "1"  "2"  "2"  "4"  "Na" "Na" "1"  "2"  "2"  "2"  "2" 
[16] "2"  "4"  "Na" "1"  "2"  "4"

NA values

R is case sensitive and Na is not the same as NA. The sample data in the question showed Na values and not NA values but if what was actually meant was a numeric vector with NA values as in AA in the Note below then modify the expression to be as shown here:

replace(AA, cumsum(!is.na(AA) & AA == 1) - cumsum(!is.na(AA) & AA == 4) & is.na(AA), 2)

giving:

[1] NA NA NA NA  1  2  2  4 NA NA  1  2  2  2  2  2  4 NA  1  2  4

Note

A <- c("Na", "Na", "Na", "Na", "1", "Na", "Na", "4", "Na", "Na", 
"1", "Na", "Na", "Na", "Na", "Na", "4", "Na", "1", "Na", "4")

AA <- as.numeric(replace(A, A == "Na", NA))
like image 88
G. Grothendieck Avatar answered Oct 01 '22 00:10

G. Grothendieck


I'm sure there's a better solution to this problem but this should do the trick:

A <-
  c(NA, NA, NA, NA, 1, NA, NA, 4, NA, NA, 1, NA, NA, NA, NA, NA, 4, NA, 1, NA, 4)

replace <- FALSE

for (i in 1:length(A)) {
  if (!is.na(A[i])) {
    if (A[i] == 1) {
      start <- i + 1
      replace <- TRUE
    }
    if (A[i] == 4 & replace == TRUE) {
      A[start:(i - 1)] <- 2
      replace <- FALSE
    }
  }
}

EDIT: if you only want to replace the NAs if there's nothing else (for example a 3) between the 1 and the 3 you could use this:

A <-
  c(NA, NA, NA, NA, 1, NA, 3, 4, NA, NA, 1, NA, NA, NA, NA, NA, 4, NA, 1, NA, 4)

replace <- FALSE

for (i in 1:length(A)) {
  if (!is.na(A[i])) {
    if (A[i] == 1) {
      start <- i + 1
      replace <- TRUE
    }
    if (A[i] == 4 & replace == TRUE) {
      A[start:(i - 1)] <- 2
      replace <- FALSE
    }
    if (A[i] != 4 & A[i] != 1){
      replace <- FALSE
    }
  }
}

Output:

> A
 [1] NA NA NA NA  1 NA  3  4 NA NA  1  2  2  2  2  2  4 NA  1  2  4

And if you only want to replace NAs but keep other values between 1 and 4 use this:

A <-
  c(NA, NA, NA, NA, 1, NA, 3, 4, NA, NA, 1, NA, NA, NA, NA, NA, 4, NA, 1, NA, 4)

replace <- FALSE

for (i in 1:length(A)) {
  if (!is.na(A[i])) {
    if (A[i] == 1) {
      start <- i + 1
      replace <- TRUE
    }
    if (A[i] == 4 & replace == TRUE) {
      sub <- A[start:(i - 1)]
      sub[is.na(sub)] <- 2
      A[start:(i - 1)] <- sub
      replace <- FALSE
    }
  }
}

Output:

> A
 [1] NA NA NA NA  1  2  3  4 NA NA  1  2  2  2  2  2  4 NA  1  2  4
like image 35
brettljausn Avatar answered Oct 01 '22 01:10

brettljausn