Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a vector from elements in common in vectors R

Tags:

r

vector

upsetr

I have several character vectors of genes containing names of the species in which they're found, and I made an UpSetR plot to show the number of species in common across genes. Now I'd like to do the opposite: Plotting the number of genes in common across species, yet I don't know how to do it.

Example of what I have:

gene1 <- c("Panda", "Dog", "Chicken")
gene2 <- c("Human", "Panda", "Dog")
gene3 <- c("Human", "Panda", "Chicken")  
...#About 20+ genes with 100+ species each

Example of what I would like to have as a result:

Panda <- c("gene1", "gene2", "gene3")
Dog <- c("gene1", "gene2")
Human <- c("gene2", "gene3")
Chicken <- c("gene1", "gene3")
...  

I know it is conceptually easy, yet logistically more complicated. Can anyone give me a clue?

Thank you!

like image 832
Guillermo Reales Avatar asked Mar 19 '18 19:03

Guillermo Reales


2 Answers

You can use unstack from base R:

unstack(stack(mget(ls(pattern="gene"))),ind~values)
$Chicken
[1] "gene1" "gene3"

$Dog
[1] "gene1" "gene2"

$Human
[1] "gene2" "gene3"

$Panda
[1] "gene1" "gene2" "gene3"

You can end up listing this to the environment by list2env function

Breakdown:

 l = mget(ls(pattern="gene"))#get all the genes in a list
 m = unstack(stack(l),ind~values)# Stack them, then unstack with the required formula
 m
$Chicken
[1] "gene1" "gene3"

$Dog
[1] "gene1" "gene2"

$Human
[1] "gene2" "gene3"

$Panda
[1] "gene1" "gene2" "gene3"

 list2env(m,.GlobalEnv)
 Dog
 [1] "gene1" "gene2"
like image 163
KU99 Avatar answered Sep 22 '22 15:09

KU99


First of all I think for most purposes it's better to store gene vectors in a list, as in

genes <- list(gene1 = gene1, gene2 = gene2, gene3 = gene3)

Then one base R approach would be

genes.v <- unlist(genes)
names(genes.v) <- rep(names(genes), times = lengths(genes))
species <- lapply(unique(genes.v), function(g) names(genes.v)[g == genes.v])
names(species) <- unique(genes.v)
species
# $Panda
# [1] "gene1" "gene2" "gene3"
#
# $Dog
# [1] "gene1" "gene2"
#
# $Chicken
# [1] "gene1" "gene3"
#
# $Human
# [1] "gene2" "gene3"

genes.v is a named vector of all the species with the genes being their names. However, when to species have the same, e.g., gene1, then those names are gene11 and gene12. That's what I fix in the second line. Then in the third line I go over all the species and create the resulting list, except that in the fourth line I add species names.

like image 25
Julius Vainora Avatar answered Sep 21 '22 15:09

Julius Vainora