I have a dataframe with >100 columns, and I would to find the unique rows by comparing only two of the columns. I'm hoping this is an easy one, but I can't get it to work with <code>unique</code> or <code>duplicated</code> myself. In the below, I would like to unique only using <code>id</code> and <code>id2</code>: <pre class="prettyprint"><code>data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z")) id id2 somevalue 1 1 x 1 1 y 3 4 z </code></pre> I would like to obtain either: <pre class="prettyprint"><code>id id2 somevalue 1 1 x 3 4 z </code></pre> or: <pre class="prettyprint"><code>id id2 somevalue 1 1 y 3 4 z </code></pre> (I have no preference which of the unique rows is kept)

Ok, if it doesn't matter which value in the non-duplicated column you select, this should be pretty easy: <pre class="prettyprint"><code>dat <- data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z")) > dat[!duplicated(dat[,c('id','id2')]),] id id2 somevalue 1 1 1 x 3 3 4 z </code></pre> Inside the <code>duplicated</code> call, I'm simply passing only those columns from <code>dat</code> that I don't want duplicates of. This code will automatically always select the first of any ambiguous values. (In this case, x.)

Unique on a dataframe with only selected columns

Tags:

r

unique

I have a dataframe with >100 columns, and I would to find the unique rows by comparing only two of the columns. I'm hoping this is an easy one, but I can't get it to work with unique or duplicated myself.

In the below, I would like to unique only using id and id2:

data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z"))  id id2 somevalue 1   1         x 1   1         y 3   4         z

I would like to obtain either:

id id2 somevalue 1   1         x 3   4         z

or:

id id2 somevalue 1   1         y 3   4         z

(I have no preference which of the unique rows is kept)

904

asked Mar 30 '12 14:03

Ina

1 Answers

Ok, if it doesn't matter which value in the non-duplicated column you select, this should be pretty easy:

dat <- data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z")) > dat[!duplicated(dat[,c('id','id2')]),]   id id2 somevalue 1  1   1         x 3  3   4         z

Inside the duplicated call, I'm simply passing only those columns from dat that I don't want duplicates of. This code will automatically always select the first of any ambiguous values. (In this case, x.)

108

answered Sep 20 '22 12:09

joran

Related questions
                            
                                set only lower bound of a limit for ggplot
                            
                                Remove part of string after "."
                            
                                unique() for more than one variable
                            
                                File path issues in R using Windows ("Hex digits in character string" error)
                            
                                How to increase the size of points in legend of ggplot2?
                            
                                How to use R with Google Colaboratory?
                            
                                How to obtain a list of directories within a directory, like list.files(), but instead "list.dirs()"
                            
                                How to coerce a list object to type 'double'
                            
                                How to replace NA values in a table for selected columns
                            
                                Removing empty rows of a data file in R
                            
                                How can I create a correlation matrix in R?
                            
                                Sankey Diagrams in R?
                            
                                Read all worksheets in an Excel workbook into an R list with data.frames
                            
                                Counting unique / distinct values by group in a data frame
                            
                                equivalent of a python dict in R
                            
                                How do I get the classes of all columns in a data frame? [duplicate]
                            
                                Combine legends for color and shape into a single legend
                            
                                R Error in x$ed : $ operator is invalid for atomic vectors
                            
                                Extracting time from POSIXct
                            
                                Fitting polynomial model to data in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With