I keep running into this and am wondering if there's an easy work-around. For some situations I find it more logical to think about subsetting a matrix in
N <- 12
N.NA <- 6
dat <- data.frame(V1=runif(N),V2=runif(N))
sel.mat <- matrix(c(sample(seq(N),N.NA),sample(ncol(dat),N.NA,replace=TRUE)),ncol=2)
This works for selection, but not for replacement:
> dat[sel.mat]
[1] 0.2582569 0.8455966 0.8828083 0.5384263 0.9574810 0.5623158
> dat[sel.mat] <- NA
Error in `[<-.data.frame`(`*tmp*`, sel.mat, value = NA) :
only logical matrix subscripts are allowed in replacement
I realize that there's a reason for the error message (it wouldn't know what to do if you had multiple replacements pointing to the same element), but that doesn't stop R from allowing integer replacement on vectors (e.g. dat$V1[c(2,3)] <- NA
).
Is there a convenient way to allow replacement by integer matrix?
Convert it to a matrix :
dat.m <- as.matrix(dat)
dat.m[sel.mat] <- NA
> dat.m
V1 V2
[1,] 0.2539189 NA
[2,] 0.5216975 NA
[3,] 0.1206138 0.14714848
[4,] 0.2841779 0.52352209
[5,] 0.3965337 NA
[6,] 0.1871074 0.23747235
[7,] 0.2991774 NA
[8,] NA 0.09509202
[9,] 0.4636460 0.59384430
[10,] 0.5493738 0.92334630
[11,] 0.7160894 NA
[12,] 0.9568567 0.80398264
Edit explain why we have an error with data.frame
dat.m[sel.mat] <- NA
is equivalent to do the following :
temp <- dat
dat <- "[<-"(temp, sel.mat, value=NA)
Error in `[<-.data.frame`(temp, sel.mat, value = NA) :
only logical matrix subscripts are allowed in replacement
now I can do the follwing and it works :
dat <- "[<-"(as.matrix(temp), sel.mat, value=NA)
You could create a logical matrix based on the integer matrix:
log.mat <- matrix(FALSE, nrow(dat), ncol(dat))
log.mat[sel.mat] <- TRUE
This matrix could be used for replacing values in the data frame with NA
(or other values):
is.na(dat) <- log.mat
The result:
V1 V2
1 0.76063534 NA
2 0.27713051 0.10593451
3 0.74301263 0.77689458
4 0.42202155 NA
5 0.54563816 0.10233017
6 NA 0.05818723
7 0.83531963 0.93805113
8 0.99316128 0.61505393
9 0.08743757 NA
10 0.95510231 0.51267338
11 0.14035257 NA
12 0.59408022 NA
This allows you to keep the original object as data frame allowing columns of different types.
FWIW, matrix indexing with replacement does work in the current R-devel
snapshot (and will be a part of R-3.0.0
). Obviously someone in R-core had the same wish as you did.
As documented in the R-devel NEWS file:
Matrix indexing of dataframes by two-column numeric indices is now supported for replacement as well as extraction.
A demonstration:
dat[sel.mat]
## [1] 0.3355509 0.4114056 0.2334332 0.6597042 0.7707762 0.7783584
dat[sel.mat] <- NA
dat[sel.mat]
## [1] NA NA NA NA NA NA
R.version.string
# [1] "R Under development (unstable) (2012-12-29 r61478)"
In R, the expressions
dat[sel.mat]
dat[sel.mat] <- NA
are S3 methods and equivalent to
`[.data.frame`(x=dat, i=sel.mat)
`[<-.data.frame`(x=dat, i=sel.mat, value=NA)
since class(dat)
is "data.frame".
You may look into the source code of
`[.data.farme`
`[<-.data.frame`
and modify it to what you want.
In your case, maybe you want:
`[<-.data.frame` <- function(x, i, j, value) {
if (class(i) != "matrix") return(base:::`[<-.data.frame`(x, i, j, value))
if (class(i[1]) != "integer") return(base:::`[<-.data.frame`(x, i, j, value))
# check the length of i and value here
if (length(value) < nrow(i)) {
if (nrow(i) %% length(value) != 0) warning("some warning message should be here")
value <- rep(value, nrow(i) %/% length(value) + 1)
}
value <- value[1:nrow(i)]
for(index in 1:nrow(i)) {
x[i[index,1], i[index,2]] <- value[index]
}
return(x)
}
try it:
N <- 12
N.NA <- 6
dat <- data.frame(V1=runif(N),V2=runif(N))
sel.mat <- matrix(c(sample(seq(N),N.NA),sample(ncol(dat),N.NA,replace=TRUE)),ncol=2)
dat[sel.mat] <- NA
dat
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With