I have a data set of dimension 401*5677. Among the column of this matrix there are columns which are identical but under different column names. Now, I want to keep only one column from the columns which are repeated more than once, and also get the index j for the columns removed.
Let us use as an example matrix, the following:
B=matrix(c(1,4,0,2,56,7,1,4,0,33,2,5), nrow=3)
colnames(B)<-c("a","b","c","d")
What I did so far (on my real matrix G) is:
corrG<-cor(G)
Gtest=G
for (i in 1:nrow(corrG)){
for (j in 1:ncol(corrG)){
if (i<j && corrG[i,j]==1){
Gtest[,j]=NA
}
}
}
Gfinal<-Gtest[,complete.cases(t(Gtest))]
My code returns a matrix that still contains (!) some duplicated columns. Any help?
Method : Using loop This task can be performed in brute force manner using loops. In this, we just iterate the list of list using loop and check for the already presence of element, and append in case it's new element, and construct a non-duplicate matrix.
columns. duplicated() returns a boolean array: a True or False for each column. If it is False then the column name is unique up to that point, if it is True then the column name is duplicated earlier. For example, using the given example, the returned value would be [False,False,True] .
Example 4: Delete Duplicates in R using dplyr's distinct() Function. In the code example above, we used the function distinct() to keep only unique/distinct rows from the data frame. When working with the distinct() function, if there are duplicate rows, only the first row, of the identical ones, is preserved.
try duplicated
function on transpose of the matrix.
duplicated.columns <- duplicated(t(your.matrix))
new.matrix <- your.matrix[, !duplicated.columns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With