I have a <code>data.frame</code> with two variables <code>id.x</code> and <code>id.y</code> whose combination uniquely identifies each row but are repeated many times in the dataset. I would like to use <code>dplyr</code> to <code>group_by</code> <code>id.x</code> such that each <code>id.x</code> is matched with a distinct <code>id.y</code>. edit edited example to highlight the differing number of <code>unique</code> <code>id.x.</code> and <code>id.y</code> An example: <pre class="prettyprint"><code> id.x id.y a o a p a q c o c p c q </code></pre> Would return: <pre class="prettyprint"><code> id.x id.y a o c q </code></pre> dput for example: <pre class="prettyprint"><code>structure(list(id.x = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("a", "c"), class = "factor"), id.y = structure(c(1L, 2L, 3L, 1L, 2L, 3L), .Label = c("o", "p", "q"), class = "factor")), .Names = c("id.x", "id.y"), row.names = c(NA, -6L), class = "data.frame") </code></pre> edit If my desired result could be accomplished without the use of <code>group_by</code> or <code>distinct</code> that is fine too! I also use <code>data.table</code>, and a <code>data.table</code> solution would be fine.

Using <code>dplyr</code> <pre class="prettyprint"><code>df %>% filter(dense_rank(id.x)==dense_rank(id.y)) </code></pre> which returns <pre class="prettyprint"><code> id.x id.y 1 a o 2 c p </code></pre>

Combine group_by and distinct

Tags:

r

dplyr

I have a data.frame with two variables id.x and id.y whose combination uniquely identifies each row but are repeated many times in the dataset.

I would like to use dplyr to group_by id.x such that each id.x is matched with a distinct id.y.

edit edited example to highlight the differing number of unique id.x. and id.y

An example:

  id.x id.y
    a    o
    a    p
    a    q
    c    o
    c    p
    c    q

Would return:

 id.x id.y
    a    o
    c    q

dput for example:

structure(list(id.x = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("a", 
"c"), class = "factor"), id.y = structure(c(1L, 2L, 3L, 1L, 2L, 
3L), .Label = c("o", "p", "q"), class = "factor")), .Names = c("id.x", 
"id.y"), row.names = c(NA, -6L), class = "data.frame")

edit If my desired result could be accomplished without the use of group_by or distinct that is fine too! I also use data.table, and a data.table solution would be fine.

347

asked Jun 11 '15 17:06

bjoseph

1 Answers

Using dplyr

df %>% filter(dense_rank(id.x)==dense_rank(id.y))

which returns

  id.x id.y
1    a    o
2    c    p

116

answered Sep 17 '22 00:09

manotheshark

Related questions
                            
                                Can not load package tcltk in R [duplicate]
                            
                                Classification functions in linear discriminant analysis in R
                            
                                RODBC: sqlUpdate() doesn't recognise index column
                            
                                Data transformation avoiding nested loops in R
                            
                                R caret / How does cross-validation for train within rfe work
                            
                                R documentation on ggplot_gtable and ggplot_build [closed]
                            
                                How to create a sub-class of data.frame with additional features
                            
                                Creating arrow head matching size (or lwd) in ggplot2
                            
                                R: conditional expand.grid function
                            
                                table header using ggplot2
                            
                                Why doesn't class(data.frame(...)) show list inheritance?
                            
                                What concept is involved here? Example in Python and R.
                            
                                Text Categorization in R
                            
                                Setting parent.env, followed by `detach`, segfaults
                            
                                How to identify overlaps in multiple columns
                            
                                Format model display in texreg or stargazer R as scientific
                            
                                Error in ls(envir = envir, all.names = private) : invalid 'envir' argument in R
                            
                                Base function that behaves like `cat` but returns value instead of writing to file
                            
                                Why is GGally::ggpairs significantly slower in RStudio vs. base R?
                            
                                How to assign fixed memory size to a variable in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With