I have two matrix :
Count the number of similar numbers between each row of mat1
with each row of mat2
:
Intersection <- function(matrix1, matrix2){
Intersection = matrix(nrow=nrow(matrix1), ncol=ncol(matrix2))
for(i in 1:nrow(matrix3)) {
for(j in 1:ncol(matrix3)) {
Intersection[i,j] = length(intersect(matrix1[i,], matrix2[j,])
}
}
return(Intersection) }
How to vectorize this function in order to avoid loops ?
Here is a sample of data in order to experiment a solution:
dput(matrix1) structure(c(1L, 20L, 2L, 1L, 7L, 2L, 22L, 12L, 2L, 27L, 3L, 35L, 16L, 3L, 32L, 4L, 37L, 35L, 17L, 33L, 5L, 38L, 46L, 27L, 49L), .Dim = c(5L, 5L))
dput(matrix2) structure(c(1, 14, 7, 1, 7, 2, 22, 12, 2, 27, 7, 35, 16, 3, 32, 14, 39, 35, 17, 32, 17, 38, 46, 20, 49), .Dim = c(5L, 5L))
The way to improve efficiency of processing is not to throw away loops but rather to examine the inner logic of the loops. In this case it appears you want to use the number of intersecting elements in TARGET
's column-i with mat
's column-j as an offset to pick elements in the "IF_n" columns and place that item in the (5+i)-th row and j-th column. We should be able to get rid of all those ifelse
statements when the problem is described in that manner. (I often find that spending time restating the problem in the clearest possible natural language is the key to improving efficiencies.) There will be a bit of a modulo arithmetrickery involved in getting the 0 result to index the fifth column.
I also have a problem with the logic in asking for the length of the intersection of df$TARGET[i] with a mat-column. It is only possible for df$TARGET[i] to be a single number, since you used vector indexing rather than matrix indexing. (df$TARGET is a matrix, so it should be df$TARGET[,i])
This is my counter-proposal. I think it both more in keeping with the desired outcome as well as probably at least 5 times faster, since you can completely eliminate all that ifelse
folderol.)
BDfunc <- function(df, mat){
for (i in 1:nrow(df)) { # print(i) (use for debugging)
for (j in 1:ncol(mat)){ # print(j)
mat[5+i, j]<- df[i , 2 + (
(length(intersect(df$TARGET[,i], mat[,j])) ) %% 5 )] }
}
return(mat)
}
mat <- BDfunc(df, mat)
> mat
[,1] [,2] [,3] [,4] [,5]
[1,] 1.000000 20.000000 2.000000 1.000000 7.000000
[2,] 2.000000 22.000000 12.000000 2.000000 27.000000
[3,] 3.000000 35.000000 16.000000 3.000000 32.000000
[4,] 4.000000 37.000000 35.000000 17.000000 33.000000
[5,] 5.000000 38.000000 46.000000 27.000000 49.000000
[6,] 5.855105 2.216690 7.458434 3.120932 2.216690
[7,] 6.381849 6.381849 6.630405 6.381849 6.630405
[8,] 2.464372 2.464372 2.464372 5.993037 5.993037
[9,] 1.614552 1.614552 1.614552 5.507400 1.614552
[10,] 2.088811 2.088811 2.088811 2.088811 5.974585
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With