I am trying to remove duplicated rows by one column (e.g the 1st column) in an R matrix. How can I extract the unique set by one column from a matrix? I've used <pre class="prettyprint"><code>x_1 <- x[unique(x[,1]),] </code></pre> While the size is correct, all of the values are <code>NA</code>. So instead, I tried <pre class="prettyprint"><code>x_1 <- x[-duplicated(x[,1]),] </code></pre> But the dimensions were incorrect.

I think you're confused about how subsetting works in R. <code>unique(x[,1])</code> will return the set of unique values in the first column. If you then try to subset using those values R thinks you're referring to rows of the matrix. So you're likely getting NAs because the values refer to rows that don't exist in the matrix. Your other attempt runs afoul of the fact that <code>duplicated</code> returns a boolean vector, not a vector of indices. So putting a minus sign in front of it converts it to a vector of 0's and -1's, which again R interprets as trying to refer to rows. Try replacing the '-' with a '!' in front of <code>duplicated</code>, which is the boolean negation operator. Something like this: <pre class="prettyprint"><code>m <- matrix(runif(100),10,10) m[c(2,5,9),1] <- 1 m[!duplicated(m[,1]),] </code></pre>

As you need the indeces of the unique rows, use <code>duplicated</code> as you tried. The problem was using <code>-</code> instead of <code>!</code>, so try: <pre class="prettyprint"><code>x[!duplicated(x[,1]),] </code></pre>

How to remove duplicated rows by a column in an R matrix

Tags:

r

duplicates

matrix

I am trying to remove duplicated rows by one column (e.g the 1st column) in an R matrix. How can I extract the unique set by one column from a matrix? I've used

x_1 <- x[unique(x[,1]),]

While the size is correct, all of the values are NA. So instead, I tried

x_1 <- x[-duplicated(x[,1]),]

But the dimensions were incorrect.

508

asked Jul 26 '11 19:07

verda

2 Answers

I think you're confused about how subsetting works in R. unique(x[,1]) will return the set of unique values in the first column. If you then try to subset using those values R thinks you're referring to rows of the matrix. So you're likely getting NAs because the values refer to rows that don't exist in the matrix.

Your other attempt runs afoul of the fact that duplicated returns a boolean vector, not a vector of indices. So putting a minus sign in front of it converts it to a vector of 0's and -1's, which again R interprets as trying to refer to rows.

Try replacing the '-' with a '!' in front of duplicated, which is the boolean negation operator. Something like this:

m <- matrix(runif(100),10,10)
m[c(2,5,9),1] <- 1
m[!duplicated(m[,1]),]

132

answered Oct 16 '22 01:10

joran

As you need the indeces of the unique rows, use duplicated as you tried. The problem was using - instead of !, so try:

x[!duplicated(x[,1]),]

answered Oct 16 '22 01:10

daroczig

Related questions
                            
                                How to implement q-learning in R?
                            
                                Extracting matched words from a string
                            
                                coerce a function call into a string
                            
                                find indices of values within tolerance range in R
                            
                                Non-equi join, then summarize by group
                            
                                Clear plotly click event
                            
                                plot polynomial regression line with ggplot stat_smooth
                            
                                Overlap ranges in single dataframe
                            
                                ggplot omits polygon holes
                            
                                R: Calculating row mean based on column name partial match
                            
                                Visualizing the difference between two points with ggplot2
                            
                                Failed to run a Shiny app on AWS Ubuntu instance. xdg-open: no method available for opening 'http://127.0.0.1:3572'
                            
                                Why does dplyr error in this nested if_else, when logical condition means output should not be evaluated?
                            
                                Widening a dataframe to get monthly sums of revenue for all unique values of catogorical columns in R
                            
                                Heatmap colors not working in plotly
                            
                                Using tidyr complete() with column names specified in variables
                            
                                Set R bookdown input directory
                            
                                What is '.R' folder and where to look for it?
                            
                                portion of a raster cell covered by one or more polygons: is there a faster way to do this (in R)?
                            
                                Output error/warning log (txt file) when running R script under command line

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With