Keep common rows among groups based on a column in dplyr

Tags:

My data frame looks like this

df <- data.frame(gene=c("A","B","C","A","B","D"), 
                 origin=rep(c("old","new"),each=3),
                 value=sample(rnorm(10,2),6))

  gene origin     value
1    A    old 1.5566908
2    B    old 1.3000358
3    C    old 0.7668213
4    A    new 2.5274712
5    B    new 2.2434525
6    D    new 2.0758326

I want to find the common genes between the two different groups of origin (old and new)

I want my data to look like this

  gene origin     value
1    A    old 1.5566908
2    B    old 1.3000358
4    A    new 2.5274712
5    B    new 2.2434525

Any help is appreciated. Ideally I would like to find common rows among groups using multiple columns

509

asked Aug 02 '21 13:08

LDT

4 Answers

A base R option using ave + subset

subset(
  df,
  as.logical(ave(origin,gene,FUN = function(x) all(c("old","new")%in% x)))
)

gives

  gene origin     value
1    A    old 0.5994593
2    B    old 4.0449345
4    A    new 3.2478612
5    B    new 0.2673525

102

answered Oct 22 '22 09:10

ThomasIsCoding

You can use split and reduce to get the common genes and use it in filter.

library(dplyr)
library(purrr)

df %>% filter(gene %in% (split(df$gene, df$origin) %>% reduce(intersect)))

#  gene origin value
#1    A    old 1.271
#2    B    old 2.838
#3    A    new 0.974
#4    B    new 1.375

Or keeping in base R -

subset(df, gene %in% Reduce(intersect, split(df$gene, df$origin)))

answered Oct 22 '22 08:10

Ronak Shah

One possibility could be:

df %>%
    group_by(gene) %>%
    filter(all(c("old", "new") %in% origin))

  gene  origin value
  <chr> <chr>  <dbl>
1 A     old    1.63 
2 B     old    0.904
3 A     new    2.18 
4 B     new    1.24

answered Oct 22 '22 09:10

tmfmnk

I would filter according to duplicates, and scan it from last and first.

library(tidyverse)

df %>% filter(
        duplicated(gene, fromLast = TRUE) | duplicated(gene, fromLast = FALSE)
)

  gene origin    value
1    A    old 2.665606
2    B    old 1.565466
3    A    new 4.025450
4    B    new 2.647110

Note: I cant replicate your data as you didnt provide a seed!

answered Oct 22 '22 10:10

Serkan

Related questions
                            
                                Count number of unique rows based on two columns, by group
                            
                                Divide all columns by the value from the 2nd column - apply for all rows
                            
                                How can I plot igraph community with defined colors?
                            
                                Incomplete list into dataframe
                            
                                Moving x or y axis together with tick labels to the middle of a single ggplot (no facets)
                            
                                How does createDataPartition function from caret package split data?
                            
                                Split columns by number in a dataframe
                            
                                Hide comments in R markdown
                            
                                Keep which(..., arr.ind = TRUE) results that connect
                            
                                How to sort source and/or target nodes in a sankey diagram within a shiny app?
                            
                                How do I create a "macro" for regressors in R?
                            
                                How can I get derivative value in R?
                            
                                R - delete consecutive (ONLY) duplicates
                            
                                Convert month's number to Month name
                            
                                combining multiple shapefiles in R
                            
                                How do I combine mutate_all and ifelse
                            
                                Simultaneously escape double and single quotes in Xpath
                            
                                Error with R package biomaRt and This dependency RSQLite
                            
                                Is there an R idiom for obtainig the index of the minimum element in a vector after filtering by a boolean index vector?
                            
                                Is there a way to collapse related variables into a single based on a condition?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Keep common rows among groups based on a column in dplyr

Tags:

dataframe

r

dplyr

filtering

tidyverse

LDT

People also ask