My data frame looks like this
df <- data.frame(gene=c("A","B","C","A","B","D"),
origin=rep(c("old","new"),each=3),
value=sample(rnorm(10,2),6))
gene origin value
1 A old 1.5566908
2 B old 1.3000358
3 C old 0.7668213
4 A new 2.5274712
5 B new 2.2434525
6 D new 2.0758326
I want to find the common genes between the two different groups of origin (old and new)
I want my data to look like this
gene origin value
1 A old 1.5566908
2 B old 1.3000358
4 A new 2.5274712
5 B new 2.2434525
Any help is appreciated. Ideally I would like to find common rows among groups using multiple columns
By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.
In order to Filter or subset rows in R we will be using Dplyr package. Dplyr package in R is provided with filter() function which subsets the rows with multiple conditions on different criteria. We will be using mtcars data to depict the example of filtering or subsetting. Filter or subset the rows in R using dplyr.
group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group". ungroup() removes grouping. The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions.
A base R option using ave
+ subset
subset(
df,
as.logical(ave(origin,gene,FUN = function(x) all(c("old","new")%in% x)))
)
gives
gene origin value
1 A old 0.5994593
2 B old 4.0449345
4 A new 3.2478612
5 B new 0.2673525
You can use split
and reduce
to get the common genes and use it in filter
.
library(dplyr)
library(purrr)
df %>% filter(gene %in% (split(df$gene, df$origin) %>% reduce(intersect)))
# gene origin value
#1 A old 1.271
#2 B old 2.838
#3 A new 0.974
#4 B new 1.375
Or keeping in base R -
subset(df, gene %in% Reduce(intersect, split(df$gene, df$origin)))
One possibility could be:
df %>%
group_by(gene) %>%
filter(all(c("old", "new") %in% origin))
gene origin value
<chr> <chr> <dbl>
1 A old 1.63
2 B old 0.904
3 A new 2.18
4 B new 1.24
I would filter
according to duplicates, and scan it from last
and first
.
library(tidyverse)
df %>% filter(
duplicated(gene, fromLast = TRUE) | duplicated(gene, fromLast = FALSE)
)
gene origin value
1 A old 2.665606
2 B old 1.565466
3 A new 4.025450
4 B new 2.647110
Note: I cant replicate your data
as you didnt provide a seed
!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With