Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert to NA after a specific value by row

Tags:

dataframe

r

Imagine the following data frame:

#  ID v1 v2 v3 v4
#1  H  0  0  d  0
#2  I  0  0  0  0
#3  J  d  0  0  0
#4  K  0  0  0  d
#5  L  0  d  0  0

There is either one or no d per row.

For each row, I want to convert everything after d to NA. Desired result:

#  ID v1  v2  v3  v4
#1  H  0   0   d  NA
#2  I  0   0   0   0
#3  J  d  NA  NA  NA
#4  K  0   0   0   d
#5  L  0   d  NA  NA

DATA

df <- data.frame(ID = LETTERS[8:12], 
                 v1 = c(0, 0, 'd', 0, 0), 
                 v2 = c(0, 0, 0, 0, 'd'), 
                 v3 = c('d', 0, 0, 0, 0), 
                 v4 = c(0, 0, 0, 'd', 0), 
      stringsAsFactors = FALSE)
like image 429
Sotos Avatar asked Sep 20 '21 09:09

Sotos


People also ask

How do I replace numbers with NA in R?

Using R replace() function to update 0 with NA R has a built-in function called replace() that replaces values in a vector with another value, for example, zeros with NAs.

How do I replace specific values in R?

replace() function in R Language is used to replace the values in the specified string vector x with indices given in list by those given in values. It takes on three parameters first is the list name, then the index at which the element needs to be replaced, and the third parameter is the replacement values.

How do I change a row value in R?

To replace a column value in R use square bracket notation df[] , By using this you can update values on a single column or on all columns. To refer to a single column use df$column_name .


6 Answers

Using cummax:

ix = df == "d"
df[t(apply(ix, 1, cummax)) & !ix] = NA
#   ID v1   v2   v3   v4
# 1  H  0    0    d <NA>
# 2  I  0    0    0    0
# 3  J  d <NA> <NA> <NA>
# 4  K  0    0    0    d
# 5  L  0    d <NA> <NA>

To increase speed, replace apply with collapse::dapply:

ix = df == "d"
df[collapse::dapply(ix, cummax, MARGIN = 1) & !ix] = NA

Or use matrixStats::rowCummaxs:

ix = df == "d"
df[rowCummaxs(ix) & !ix] = NA

For pre-0.62.0 matrixStats, see previous revision.

like image 122
Henrik Avatar answered Oct 23 '22 09:10

Henrik


Two alternative solutions:

# option 1
w <- which(df == "d", arr.ind = TRUE)
w <- w[w[,2] < ncol(df),]
reps <- ncol(df) - w[,2]
w <- w[rep(1:nrow(w), reps),]
w[,2] <- w[,2] + unlist(sapply(reps, seq))

df[w] <- NA

# option 2
mc <- ncol(df) - max.col(df == "d", ties.method = "first")
mc[mc >= (ncol(df) - 1)] <- 0
rr <- rep(seq_along(mc), mc)
cc <- rep(ncol(df) - mc, mc) + unlist(sapply(mc, seq)[mc > 0])

df[cbind(rr, cc)] <- NA

which both also give the desired result.

like image 26
Jaap Avatar answered Oct 23 '22 08:10

Jaap


My version for solving it is:

f1 <- function(x){
  i1 <- which(x == 'd') + 1
  cond <- length(i1) > 0 && i1 <= length(x)
  if (cond){x[i1:(length(x))] <- NA;x}else{x}
}
df[-1] <- t(apply(df[-1], 1, f1))

which gives,

#  ID v1   v2   v3   v4
#1  H  0    0    d <NA>
#2  I  0    0    0    0
#3  J  d <NA> <NA> <NA>
#4  K  0    0    0    d
#5  L  0    d <NA> <NA>
like image 7
Sotos Avatar answered Oct 23 '22 09:10

Sotos


Here are two base R one-liners.

1) Reduce Because this operates on entire columns at a time instead of row by row it should be particularly fast if there are many rows and not many columns.

replace(df, TRUE, Reduce(function(x, y) ifelse(x == "d", NA, y), df, acc = TRUE))

giving:

  ID v1   v2   v3   v4
1  H  0    0    d <NA>
2  I  0    0    0    0
3  J  d <NA> <NA> <NA>
4  K  0    0    0    d
5  L  0    d <NA> <NA>

2) read.table This assumes that the only occurrences of d are in cells consisting of a single d (which is the case for the example in the question).

replace(df, df!="d"&is.na(read.table(text=do.call(paste,df), comment="d", fill=NA)), NA)

giving:

  ID v1   v2   v3   v4
1  H  0    0    d <NA>
2  I  0    0    0    0
3  J  d <NA> <NA> <NA>
4  K  0    0    0    d
5  L  0    d <NA> <NA>
like image 7
G. Grothendieck Avatar answered Oct 23 '22 09:10

G. Grothendieck


Another version using col and max.col:

df[-1][col(df[-1]) > max.col(df[-1] == "d", "last")] <- NA
df

#  ID v1   v2   v3   v4
#1  H  0    0    d <NA>
#2  I  0    0    0    0
#3  J  d <NA> <NA> <NA>
#4  K  0    0    0    d
#5  L  0    d <NA> <NA>
like image 6
thelatemail Avatar answered Oct 23 '22 09:10

thelatemail


Some alternative with data.table:

library(data.table)
setDT(df)

df[, names(df)[-1] := {x <- unlist(.SD)
                      if(any(x=="d")) { # if there's no "d", no need to do anything
                            whd <- which(x=="d")
                            if(whd != length(x)) { # if "d" is at the end, nothing to be done either
                                x[(whd+1):length(x)] <- NA
                            }
                       }
                       as.list(x)}, # return the line as a list so the structure is kept
       .SDcols=-1, by=1:nrow(df)] # you need to do a "by row" operation
like image 4
Cath Avatar answered Oct 23 '22 09:10

Cath