Subsetting a data.frame with an integer matrix

Question

I keep running into this and am wondering if there's an easy work-around. For some situations I find it more logical to think about subsetting a matrix in

N <- 12
N.NA <- 6
dat <- data.frame(V1=runif(N),V2=runif(N))
sel.mat <- matrix(c(sample(seq(N),N.NA),sample(ncol(dat),N.NA,replace=TRUE)),ncol=2)

This works for selection, but not for replacement:

> dat[sel.mat]
[1] 0.2582569 0.8455966 0.8828083 0.5384263 0.9574810 0.5623158
> dat[sel.mat] <- NA
Error in `[<-.data.frame`(`*tmp*`, sel.mat, value = NA) : 
  only logical matrix subscripts are allowed in replacement

I realize that there's a reason for the error message (it wouldn't know what to do if you had multiple replacements pointing to the same element), but that doesn't stop R from allowing integer replacement on vectors (e.g. dat$V1[c(2,3)] <- NA).

Is there a convenient way to allow replacement by integer matrix?

agstudy · Accepted Answer

Convert it to a matrix :

dat.m <- as.matrix(dat)
dat.m[sel.mat] <- NA
> dat.m
             V1         V2
 [1,] 0.2539189         NA
 [2,] 0.5216975         NA
 [3,] 0.1206138 0.14714848
 [4,] 0.2841779 0.52352209
 [5,] 0.3965337         NA
 [6,] 0.1871074 0.23747235
 [7,] 0.2991774         NA
 [8,]        NA 0.09509202
 [9,] 0.4636460 0.59384430
[10,] 0.5493738 0.92334630
[11,] 0.7160894         NA
[12,] 0.9568567 0.80398264

Edit explain why we have an error with data.frame

dat.m[sel.mat] <- NA

is equivalent to do the following :

temp <- dat
dat <- "[<-"(temp, sel.mat, value=NA)

 Error in `[<-.data.frame`(temp, sel.mat, value = NA) : 
 only logical matrix subscripts are allowed in replacement

now I can do the follwing and it works :

dat <- "[<-"(as.matrix(temp), sel.mat, value=NA)

Sven Hohenstein · Answer

You could create a logical matrix based on the integer matrix:

log.mat <- matrix(FALSE, nrow(dat), ncol(dat))
log.mat[sel.mat] <- TRUE

This matrix could be used for replacing values in the data frame with NA (or other values):

is.na(dat) <- log.mat

The result:

           V1         V2
1  0.76063534         NA
2  0.27713051 0.10593451
3  0.74301263 0.77689458
4  0.42202155         NA
5  0.54563816 0.10233017
6          NA 0.05818723
7  0.83531963 0.93805113
8  0.99316128 0.61505393
9  0.08743757         NA
10 0.95510231 0.51267338
11 0.14035257         NA
12 0.59408022         NA

This allows you to keep the original object as data frame allowing columns of different types.

Josh O'Brien · Answer

FWIW, matrix indexing with replacement does work in the current R-devel snapshot (and will be a part of R-3.0.0). Obviously someone in R-core had the same wish as you did.

As documented in the R-devel NEWS file:

Matrix indexing of dataframes by two-column numeric indices is now supported for replacement as well as extraction.

A demonstration:

dat[sel.mat]
## [1] 0.3355509 0.4114056 0.2334332 0.6597042 0.7707762 0.7783584
dat[sel.mat] <- NA
dat[sel.mat]
## [1] NA NA NA NA NA NA

R.version.string
# [1] "R Under development (unstable) (2012-12-29 r61478)"

wush978 · Answer

In R, the expressions

dat[sel.mat]
dat[sel.mat] <- NA

are S3 methods and equivalent to

`[.data.frame`(x=dat, i=sel.mat)
`[<-.data.frame`(x=dat, i=sel.mat, value=NA)

since class(dat) is "data.frame".

You may look into the source code of

`[.data.farme`
`[<-.data.frame`

and modify it to what you want.

In your case, maybe you want:

`[<-.data.frame` <- function(x, i, j, value) {
  if (class(i) != "matrix") return(base:::`[<-.data.frame`(x, i, j, value))
  if (class(i[1]) != "integer") return(base:::`[<-.data.frame`(x, i, j, value))
  # check the length of i and value here
  if (length(value) < nrow(i)) {
    if (nrow(i) %% length(value) != 0) warning("some warning message should be here")
    value <- rep(value, nrow(i) %/% length(value) + 1)
  }
  value <- value[1:nrow(i)]
  for(index in 1:nrow(i)) {
    x[i[index,1], i[index,2]] <- value[index]
  }
  return(x)
}

try it:

N <- 12
N.NA <- 6
dat <- data.frame(V1=runif(N),V2=runif(N))
sel.mat <- matrix(c(sample(seq(N),N.NA),sample(ncol(dat),N.NA,replace=TRUE)),ncol=2)
dat[sel.mat] <- NA
dat

Subsetting a data.frame with an integer matrix

Tags:

r

Ari B. Friedman

4 Answers

agstudy

Sven Hohenstein

Josh O'Brien

wush978

Recent Activity

Donate For Us

Subsetting a data.frame with an integer matrix

Tags:

r

Ari B. Friedman

4 Answers

agstudy

Sven Hohenstein

Josh O'Brien

wush978

Related questions

Recent Activity

Donate For Us