Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subsetting a data.frame with an integer matrix

Tags:

r

I keep running into this and am wondering if there's an easy work-around. For some situations I find it more logical to think about subsetting a matrix in

N <- 12
N.NA <- 6
dat <- data.frame(V1=runif(N),V2=runif(N))
sel.mat <- matrix(c(sample(seq(N),N.NA),sample(ncol(dat),N.NA,replace=TRUE)),ncol=2)

This works for selection, but not for replacement:

> dat[sel.mat]
[1] 0.2582569 0.8455966 0.8828083 0.5384263 0.9574810 0.5623158
> dat[sel.mat] <- NA
Error in `[<-.data.frame`(`*tmp*`, sel.mat, value = NA) : 
  only logical matrix subscripts are allowed in replacement

I realize that there's a reason for the error message (it wouldn't know what to do if you had multiple replacements pointing to the same element), but that doesn't stop R from allowing integer replacement on vectors (e.g. dat$V1[c(2,3)] <- NA).

Is there a convenient way to allow replacement by integer matrix?

like image 943
Ari B. Friedman Avatar asked Jan 24 '13 10:01

Ari B. Friedman


4 Answers

Convert it to a matrix :

dat.m <- as.matrix(dat)
dat.m[sel.mat] <- NA
> dat.m
             V1         V2
 [1,] 0.2539189         NA
 [2,] 0.5216975         NA
 [3,] 0.1206138 0.14714848
 [4,] 0.2841779 0.52352209
 [5,] 0.3965337         NA
 [6,] 0.1871074 0.23747235
 [7,] 0.2991774         NA
 [8,]        NA 0.09509202
 [9,] 0.4636460 0.59384430
[10,] 0.5493738 0.92334630
[11,] 0.7160894         NA
[12,] 0.9568567 0.80398264

Edit explain why we have an error with data.frame

dat.m[sel.mat] <- NA

is equivalent to do the following :

temp <- dat
dat <- "[<-"(temp, sel.mat, value=NA)

 Error in `[<-.data.frame`(temp, sel.mat, value = NA) : 
 only logical matrix subscripts are allowed in replacement

now I can do the follwing and it works :

dat <- "[<-"(as.matrix(temp), sel.mat, value=NA)
like image 112
agstudy Avatar answered Sep 28 '22 04:09

agstudy


You could create a logical matrix based on the integer matrix:

log.mat <- matrix(FALSE, nrow(dat), ncol(dat))
log.mat[sel.mat] <- TRUE

This matrix could be used for replacing values in the data frame with NA (or other values):

is.na(dat) <- log.mat

The result:

           V1         V2
1  0.76063534         NA
2  0.27713051 0.10593451
3  0.74301263 0.77689458
4  0.42202155         NA
5  0.54563816 0.10233017
6          NA 0.05818723
7  0.83531963 0.93805113
8  0.99316128 0.61505393
9  0.08743757         NA
10 0.95510231 0.51267338
11 0.14035257         NA
12 0.59408022         NA

This allows you to keep the original object as data frame allowing columns of different types.

like image 39
Sven Hohenstein Avatar answered Sep 28 '22 06:09

Sven Hohenstein


FWIW, matrix indexing with replacement does work in the current R-devel snapshot (and will be a part of R-3.0.0). Obviously someone in R-core had the same wish as you did.

As documented in the R-devel NEWS file:

Matrix indexing of dataframes by two-column numeric indices is now supported for replacement as well as extraction.

A demonstration:

dat[sel.mat]
## [1] 0.3355509 0.4114056 0.2334332 0.6597042 0.7707762 0.7783584
dat[sel.mat] <- NA
dat[sel.mat]
## [1] NA NA NA NA NA NA

R.version.string
# [1] "R Under development (unstable) (2012-12-29 r61478)"
like image 36
Josh O'Brien Avatar answered Sep 28 '22 06:09

Josh O'Brien


In R, the expressions

dat[sel.mat]
dat[sel.mat] <- NA

are S3 methods and equivalent to

`[.data.frame`(x=dat, i=sel.mat)
`[<-.data.frame`(x=dat, i=sel.mat, value=NA)

since class(dat) is "data.frame".

You may look into the source code of

`[.data.farme`
`[<-.data.frame`

and modify it to what you want.


In your case, maybe you want:

`[<-.data.frame` <- function(x, i, j, value) {
  if (class(i) != "matrix") return(base:::`[<-.data.frame`(x, i, j, value))
  if (class(i[1]) != "integer") return(base:::`[<-.data.frame`(x, i, j, value))
  # check the length of i and value here
  if (length(value) < nrow(i)) {
    if (nrow(i) %% length(value) != 0) warning("some warning message should be here")
    value <- rep(value, nrow(i) %/% length(value) + 1)
  }
  value <- value[1:nrow(i)]
  for(index in 1:nrow(i)) {
    x[i[index,1], i[index,2]] <- value[index]
  }
  return(x)
}

try it:

N <- 12
N.NA <- 6
dat <- data.frame(V1=runif(N),V2=runif(N))
sel.mat <- matrix(c(sample(seq(N),N.NA),sample(ncol(dat),N.NA,replace=TRUE)),ncol=2)
dat[sel.mat] <- NA
dat
like image 20
wush978 Avatar answered Sep 28 '22 05:09

wush978