Imagine the following data frame: <pre class="prettyprint"><code># ID v1 v2 v3 v4 #1 H 0 0 d 0 #2 I 0 0 0 0 #3 J d 0 0 0 #4 K 0 0 0 d #5 L 0 d 0 0 </code></pre> There is either one or no <code>d</code> per row. For each row, I want to convert everything after <code>d</code> to <code>NA</code>. Desired result: <pre class="prettyprint"><code># ID v1 v2 v3 v4 #1 H 0 0 d NA #2 I 0 0 0 0 #3 J d NA NA NA #4 K 0 0 0 d #5 L 0 d NA NA </code></pre> DATA <pre class="prettyprint"><code>df <- data.frame(ID = LETTERS[8:12], v1 = c(0, 0, 'd', 0, 0), v2 = c(0, 0, 0, 0, 'd'), v3 = c('d', 0, 0, 0, 0), v4 = c(0, 0, 0, 'd', 0), stringsAsFactors = FALSE) </code></pre>

Using <code>cummax</code>: <pre class="prettyprint"><code>ix = df == "d" df[t(apply(ix, 1, cummax)) & !ix] = NA # ID v1 v2 v3 v4 # 1 H 0 0 d <NA> # 2 I 0 0 0 0 # 3 J d <NA> <NA> <NA> # 4 K 0 0 0 d # 5 L 0 d <NA> <NA> </code></pre> To increase speed, replace <code>apply</code> with <code>collapse::dapply</code>: <pre class="prettyprint"><code>ix = df == "d" df[collapse::dapply(ix, cummax, MARGIN = 1) & !ix] = NA </code></pre> Or use <code>matrixStats::rowCummaxs</code>: <pre class="prettyprint"><code>ix = df == "d" df[rowCummaxs(ix) & !ix] = NA </code></pre> For pre-<code>0.62.0 matrixStats</code>, see previous revision.

Two alternative solutions: <pre class="prettyprint"><code># option 1 w <- which(df == "d", arr.ind = TRUE) w <- w[w[,2] < ncol(df),] reps <- ncol(df) - w[,2] w <- w[rep(1:nrow(w), reps),] w[,2] <- w[,2] + unlist(sapply(reps, seq)) df[w] <- NA # option 2 mc <- ncol(df) - max.col(df == "d", ties.method = "first") mc[mc >= (ncol(df) - 1)] <- 0 rr <- rep(seq_along(mc), mc) cc <- rep(ncol(df) - mc, mc) + unlist(sapply(mc, seq)[mc > 0]) df[cbind(rr, cc)] <- NA </code></pre> which both also give the desired result.

My version for solving it is: <pre class="prettyprint"><code>f1 <- function(x){ i1 <- which(x == 'd') + 1 cond <- length(i1) > 0 && i1 <= length(x) if (cond){x[i1:(length(x))] <- NA;x}else{x} } df[-1] <- t(apply(df[-1], 1, f1)) </code></pre> which gives, <pre class="prettyprint"><code># ID v1 v2 v3 v4 #1 H 0 0 d <NA> #2 I 0 0 0 0 #3 J d <NA> <NA> <NA> #4 K 0 0 0 d #5 L 0 d <NA> <NA> </code></pre>

Here are two base R one-liners. 1) Reduce Because this operates on entire columns at a time instead of row by row it should be particularly fast if there are many rows and not many columns. <pre class="prettyprint"><code>replace(df, TRUE, Reduce(function(x, y) ifelse(x == "d", NA, y), df, acc = TRUE)) </code></pre> giving: <pre class="prettyprint"><code> ID v1 v2 v3 v4 1 H 0 0 d <NA> 2 I 0 0 0 0 3 J d <NA> <NA> <NA> 4 K 0 0 0 d 5 L 0 d <NA> <NA> </code></pre> 2) read.table This assumes that the only occurrences of <code>d</code> are in cells consisting of a single <code>d</code> (which is the case for the example in the question). <pre class="prettyprint"><code>replace(df, df!="d"&is.na(read.table(text=do.call(paste,df), comment="d", fill=NA)), NA) </code></pre> giving: <pre class="prettyprint"><code> ID v1 v2 v3 v4 1 H 0 0 d <NA> 2 I 0 0 0 0 3 J d <NA> <NA> <NA> 4 K 0 0 0 d 5 L 0 d <NA> <NA> </code></pre>

Another version using <code>col</code> and <code>max.col</code>: <pre class="prettyprint"><code>df[-1][col(df[-1]) > max.col(df[-1] == "d", "last")] <- NA df # ID v1 v2 v3 v4 #1 H 0 0 d <NA> #2 I 0 0 0 0 #3 J d <NA> <NA> <NA> #4 K 0 0 0 d #5 L 0 d <NA> <NA> </code></pre>

Some alternative with data.table: <pre class="prettyprint"><code>library(data.table) setDT(df) df[, names(df)[-1] := {x <- unlist(.SD) if(any(x=="d")) { # if there's no "d", no need to do anything whd <- which(x=="d") if(whd != length(x)) { # if "d" is at the end, nothing to be done either x[(whd+1):length(x)] <- NA } } as.list(x)}, # return the line as a list so the structure is kept .SDcols=-1, by=1:nrow(df)] # you need to do a "by row" operation </code></pre>

Convert to NA after a specific value by row

Tags:

dataframe

r

Imagine the following data frame:

#  ID v1 v2 v3 v4
#1  H  0  0  d  0
#2  I  0  0  0  0
#3  J  d  0  0  0
#4  K  0  0  0  d
#5  L  0  d  0  0

There is either one or no d per row.

For each row, I want to convert everything after d to NA. Desired result:

#  ID v1  v2  v3  v4
#1  H  0   0   d  NA
#2  I  0   0   0   0
#3  J  d  NA  NA  NA
#4  K  0   0   0   d
#5  L  0   d  NA  NA

DATA

df <- data.frame(ID = LETTERS[8:12], 
                 v1 = c(0, 0, 'd', 0, 0), 
                 v2 = c(0, 0, 0, 0, 'd'), 
                 v3 = c('d', 0, 0, 0, 0), 
                 v4 = c(0, 0, 0, 'd', 0), 
      stringsAsFactors = FALSE)

429

asked Sep 20 '21 09:09

Sotos

6 Answers

Using cummax:

ix = df == "d"
df[t(apply(ix, 1, cummax)) & !ix] = NA
#   ID v1   v2   v3   v4
# 1  H  0    0    d <NA>
# 2  I  0    0    0    0
# 3  J  d <NA> <NA> <NA>
# 4  K  0    0    0    d
# 5  L  0    d <NA> <NA>

To increase speed, replace apply with collapse::dapply:

ix = df == "d"
df[collapse::dapply(ix, cummax, MARGIN = 1) & !ix] = NA

Or use matrixStats::rowCummaxs:

ix = df == "d"
df[rowCummaxs(ix) & !ix] = NA

For pre-0.62.0 matrixStats, see previous revision.

122

answered Oct 23 '22 09:10

Henrik

Two alternative solutions:

# option 1
w <- which(df == "d", arr.ind = TRUE)
w <- w[w[,2] < ncol(df),]
reps <- ncol(df) - w[,2]
w <- w[rep(1:nrow(w), reps),]
w[,2] <- w[,2] + unlist(sapply(reps, seq))

df[w] <- NA

# option 2
mc <- ncol(df) - max.col(df == "d", ties.method = "first")
mc[mc >= (ncol(df) - 1)] <- 0
rr <- rep(seq_along(mc), mc)
cc <- rep(ncol(df) - mc, mc) + unlist(sapply(mc, seq)[mc > 0])

df[cbind(rr, cc)] <- NA

which both also give the desired result.

answered Oct 23 '22 08:10

Jaap

My version for solving it is:

f1 <- function(x){
  i1 <- which(x == 'd') + 1
  cond <- length(i1) > 0 && i1 <= length(x)
  if (cond){x[i1:(length(x))] <- NA;x}else{x}
}
df[-1] <- t(apply(df[-1], 1, f1))

which gives,

#  ID v1   v2   v3   v4
#1  H  0    0    d <NA>
#2  I  0    0    0    0
#3  J  d <NA> <NA> <NA>
#4  K  0    0    0    d
#5  L  0    d <NA> <NA>

answered Oct 23 '22 09:10

Sotos

Here are two base R one-liners.

1) Reduce Because this operates on entire columns at a time instead of row by row it should be particularly fast if there are many rows and not many columns.

replace(df, TRUE, Reduce(function(x, y) ifelse(x == "d", NA, y), df, acc = TRUE))

giving:

  ID v1   v2   v3   v4
1  H  0    0    d <NA>
2  I  0    0    0    0
3  J  d <NA> <NA> <NA>
4  K  0    0    0    d
5  L  0    d <NA> <NA>

2) read.table This assumes that the only occurrences of d are in cells consisting of a single d (which is the case for the example in the question).

replace(df, df!="d"&is.na(read.table(text=do.call(paste,df), comment="d", fill=NA)), NA)

giving:

  ID v1   v2   v3   v4
1  H  0    0    d <NA>
2  I  0    0    0    0
3  J  d <NA> <NA> <NA>
4  K  0    0    0    d
5  L  0    d <NA> <NA>

answered Oct 23 '22 09:10

G. Grothendieck

Another version using col and max.col:

df[-1][col(df[-1]) > max.col(df[-1] == "d", "last")] <- NA
df

#  ID v1   v2   v3   v4
#1  H  0    0    d <NA>
#2  I  0    0    0    0
#3  J  d <NA> <NA> <NA>
#4  K  0    0    0    d
#5  L  0    d <NA> <NA>

answered Oct 23 '22 09:10

thelatemail

Some alternative with data.table:

library(data.table)
setDT(df)

df[, names(df)[-1] := {x <- unlist(.SD)
                      if(any(x=="d")) { # if there's no "d", no need to do anything
                            whd <- which(x=="d")
                            if(whd != length(x)) { # if "d" is at the end, nothing to be done either
                                x[(whd+1):length(x)] <- NA
                            }
                       }
                       as.list(x)}, # return the line as a list so the structure is kept
       .SDcols=-1, by=1:nrow(df)] # you need to do a "by row" operation

answered Oct 23 '22 09:10

Cath

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Convert to NA after a specific value by row

Tags:

dataframe

r

Sotos

People also ask

6 Answers

Henrik

Jaap

Sotos

G. Grothendieck

thelatemail

Cath

Recent Activity

Donate For Us

Convert to NA after a specific value by row

Tags:

dataframe

r

Sotos

People also ask

6 Answers

Henrik

Jaap

Sotos

G. Grothendieck

thelatemail

Cath

Related questions

Recent Activity

Donate For Us