Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove duplicated rows by a column in an R matrix

I am trying to remove duplicated rows by one column (e.g the 1st column) in an R matrix. How can I extract the unique set by one column from a matrix? I've used

x_1 <- x[unique(x[,1]),]

While the size is correct, all of the values are NA. So instead, I tried

x_1 <- x[-duplicated(x[,1]),]

But the dimensions were incorrect.

like image 508
verda Avatar asked Jul 26 '11 19:07

verda


People also ask

How do I remove duplicate rows in matrix in R?

Remove Duplicate rows in R using Dplyr – distinct () function. Distinct function in R is used to remove duplicate rows in R using Dplyr package. Dplyr package in R is provided with distinct() function which eliminate duplicates rows with single variable or with multiple variable.

How do I remove repeating columns in R?

So, how do you remove duplicate column names in R? The easiest way to remove repeated column names from a data frame is by using the duplicated() function. This function (together with the colnames() function) indicates for each column name if it appears more than once.

How do I find duplicate rows in a column in R?

We can find the rows with duplicated values in a particular column of an R data frame by using duplicated function inside the subset function. This will return only the duplicate rows based on the column we choose that means the first unique value will not be in the output.


2 Answers

I think you're confused about how subsetting works in R. unique(x[,1]) will return the set of unique values in the first column. If you then try to subset using those values R thinks you're referring to rows of the matrix. So you're likely getting NAs because the values refer to rows that don't exist in the matrix.

Your other attempt runs afoul of the fact that duplicated returns a boolean vector, not a vector of indices. So putting a minus sign in front of it converts it to a vector of 0's and -1's, which again R interprets as trying to refer to rows.

Try replacing the '-' with a '!' in front of duplicated, which is the boolean negation operator. Something like this:

m <- matrix(runif(100),10,10)
m[c(2,5,9),1] <- 1
m[!duplicated(m[,1]),]
like image 132
joran Avatar answered Oct 16 '22 01:10

joran


As you need the indeces of the unique rows, use duplicated as you tried. The problem was using - instead of !, so try:

x[!duplicated(x[,1]),]
like image 30
daroczig Avatar answered Oct 16 '22 01:10

daroczig