Using the following code;
c <- NULL
for (a in 1:4){
b <- seq(from = a, to = a + 5)
c <- rbind(c,b)
}
c <- rbind(c,c); rm(a,b)
Results in this matrix,
> c
[,1] [,2] [,3] [,4] [,5] [,6]
b 1 2 3 4 5 6
b 2 3 4 5 6 7
b 3 4 5 6 7 8
b 4 5 6 7 8 9
b 1 2 3 4 5 6
b 2 3 4 5 6 7
b 3 4 5 6 7 8
b 4 5 6 7 8 9
How can I return row indices for rows matching a specific input?
For example, with a search term of,
z <- c(3,4,5,6,7,8)
I need the following returned,
[1] 3 7
This will be used in a fairly large data frame of test data, related to a time step column, to reduce the data by accumulating time steps for matching rows.
Question answered well by others. Due to my dataset size (9.5M rows), I came up with an efficient approach that took a couple steps.
1) Sort the big data frame 'dc' containing time steps to accumulate in column 1.
dc <- dc[order(dc[,2],dc[,3],dc[,4],dc[,5],dc[,6],dc[,7],dc[,8]),]
2) Create a new data frame with unique entries (excluding column 1).
dcU <- unique(dc[,2:8])
3) Write Rcpp (C++) function to loop through unique data frame which iterates through the original data frame accumulating time while rows are equal and indexes to the next for loop step when an unequal row is identified.
require(Rcpp)
getTsrc <-
'
NumericVector getT(NumericMatrix dc, NumericMatrix dcU)
{
int k = 0;
int n = dcU.nrow();
NumericVector tU(n);
for (int i = 0; i<n; i++)
{
while ((dcU(i,0)==dc(k,1))&&(dcU(i,1)==dc(k,2))&&(dcU(i,2)==dc(k,3))&&
(dcU(i,3)==dc(k,4))&&(dcU(i,4)==dc(k,5))&&(dcU(i,5)==dc(k,6))&&
(dcU(i,6)==dc(k,7)))
{
tU[i] = tU[i] + dc(k,0);
k++;
}
}
return(tU);
}
'
cppFunction(getTsrc)
4) Convert function inputs to matrices.
dc1 <- as.matrix(dc)
dcU1 <- as.matrix(dcU)
5) Run the function and time it (returns time vector matching unique data frame)
pt <- proc.time()
t <- getT(dc1, dcU1)
print(proc.time() - pt)
user system elapsed
0.18 0.03 0.20
6) Self high-five and more coffee.
You can use apply
.
Here we use apply
on c
, across rows (the 1
), and use a function function(x) all(x == z)
on each row.
The which then pulls out the integer positions of the rows.
which(apply(c, 1, function(x) all(x == z)))
b b
3 7
EDIT: If your real data is having problems with this, and is only 9 columns (not too much typing), you could try a fully vectorized solution:
which((c[,1]==z[1] & c[,2]==z[2] & c[,3]==z[3] & c[,4]==z[4]& c[,5]==z[5]& c[,6]==z[6]))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With