I have a question about the Reduce function in R. I read its documentation, but I am still confused a bit. So, I have 5 vectors with genes name. For example: <pre class="prettyprint"><code>v1 <- c("geneA","geneB",""...) v2 <- c("geneA","geneC",""...) v3 <- c("geneD","geneE",""...) v4 <- c("geneA","geneE",""...) v5 <- c("geneB","geneC",""...) </code></pre> And I would like to find out which genes are present in at least two vectors. Some people have suggested: <pre class="prettyprint"><code>Reduce(intersect,list(a,b,c,d,e)) </code></pre> I would greatly appreciate if someone could please explain to me how this statement works, because I have seen Reduce used in other scenarios.

<code>Reduce</code> takes a binary function and a list of data items and successively applies the function to the list elements in a recursive fashion. For example: <pre class="prettyprint"><code>Reduce(intersect,list(a,b,c)) </code></pre> is the same as <pre class="prettyprint"><code>intersect((intersect(a,b),c) </code></pre> However, I don't think that construct will help you here as it will only return those elements that are common to all vectors. To count the number of vectors that a gene appears in you could do the following: <pre class="prettyprint"><code>vlist <- list(v1,v2,v3,v4,v5) addmargins(table(gene=unlist(vlist), vec=rep(paste0("v",1:5),times=sapply(vlist,length))),2,list(Count=function(x) sum(x[x>0]))) vec gene v1 v2 v3 v4 v5 Count geneA 1 1 0 1 0 3 geneB 1 0 0 0 1 2 geneC 0 1 0 0 1 2 geneD 0 0 1 0 0 1 geneE 0 0 1 1 0 2 </code></pre>

A nice way to see what <code>Reduce()</code> is doing is to run it with its argument <code>accumulate=TRUE</code>. When <code>accumulate=TRUE</code>, it will return a vector or list in which each element shows its state after processing the first n elements of the list in <code>x</code>. Here are a couple of examples: <pre class="prettyprint"><code>Reduce(`*`, x=list(5,4,3,2), accumulate=TRUE) # [1] 5 20 60 120 i2 <- seq(0,100,by=2) i3 <- seq(0,100,by=3) i5 <- seq(0,100,by=5) Reduce(intersect, x=list(i2,i3,i5), accumulate=TRUE) # [[1]] # [1] 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 # [20] 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 # [39] 76 78 80 82 84 86 88 90 92 94 96 98 100 # # [[2]] # [1] 0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 # # [[3]] # [1] 0 30 60 90 </code></pre>

Assuming the input values given at the end of this answer, the expression <pre class="prettyprint"><code>Reduce(intersect,list(a,b,c,d,e)) ## character(0) </code></pre> gives the genes that are present in all vectors, not the genes that are present in at least two vectors. It means: <pre class="prettyprint"><code>intersect(intersect(intersect(intersect(a, b), c), d), e) ## character(0) </code></pre> If we want the genes that are in at least two vectors: <pre class="prettyprint"><code>L <- list(a, b, c, d, e) u <- unlist(lapply(L, unique)) # or: Reduce(c, lapply(L, unique)) tab <- table(u) names(tab[tab > 1]) ## [1] "geneA" "geneB" "geneC" "geneE" </code></pre> or <pre class="prettyprint"><code>sort(unique(u[duplicated(u)])) ## [1] "geneA" "geneB" "geneC" "geneE" </code></pre> Note: We used: <pre class="prettyprint"><code>a <- c("geneA","geneB") b <- c("geneA","geneC") c <- c("geneD","geneE") d <- c("geneA","geneE") e <- c("geneB","geneC") </code></pre>

Understand the `Reduce` function

Tags:

r

reduce

I have a question about the Reduce function in R. I read its documentation, but I am still confused a bit. So, I have 5 vectors with genes name. For example:

v1 <- c("geneA","geneB",""...)
v2 <- c("geneA","geneC",""...)
v3 <- c("geneD","geneE",""...)
v4 <- c("geneA","geneE",""...)
v5 <- c("geneB","geneC",""...)

And I would like to find out which genes are present in at least two vectors. Some people have suggested:

Reduce(intersect,list(a,b,c,d,e))

I would greatly appreciate if someone could please explain to me how this statement works, because I have seen Reduce used in other scenarios.

783

asked Feb 16 '15 16:02

Johnathan

Video Answer

3 Answers

Reduce takes a binary function and a list of data items and successively applies the function to the list elements in a recursive fashion. For example:

Reduce(intersect,list(a,b,c))

is the same as

intersect((intersect(a,b),c)

However, I don't think that construct will help you here as it will only return those elements that are common to all vectors.

To count the number of vectors that a gene appears in you could do the following:

vlist <- list(v1,v2,v3,v4,v5)
addmargins(table(gene=unlist(vlist), vec=rep(paste0("v",1:5),times=sapply(vlist,length))),2,list(Count=function(x) sum(x[x>0])))
       vec
gene    v1 v2 v3 v4 v5 Count
  geneA  1  1  0  1  0     3
  geneB  1  0  0  0  1     2
  geneC  0  1  0  0  1     2
  geneD  0  0  1  0  0     1
  geneE  0  0  1  1  0     2

108

answered Oct 24 '22 00:10

James

A nice way to see what Reduce() is doing is to run it with its argument accumulate=TRUE. When accumulate=TRUE, it will return a vector or list in which each element shows its state after processing the first n elements of the list in x. Here are a couple of examples:

Reduce(`*`, x=list(5,4,3,2), accumulate=TRUE)
# [1]   5  20  60 120

i2 <- seq(0,100,by=2)
i3 <- seq(0,100,by=3)
i5 <- seq(0,100,by=5)
Reduce(intersect, x=list(i2,i3,i5), accumulate=TRUE)
# [[1]]
#  [1]   0   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36
# [20]  38  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74
# [39]  76  78  80  82  84  86  88  90  92  94  96  98 100
# 
# [[2]]
#  [1]  0  6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96
# 
# [[3]]
# [1]  0 30 60 90

answered Oct 23 '22 23:10

Josh O'Brien

Assuming the input values given at the end of this answer, the expression

Reduce(intersect,list(a,b,c,d,e))
## character(0)

gives the genes that are present in all vectors, not the genes that are present in at least two vectors. It means:

intersect(intersect(intersect(intersect(a, b), c), d), e)
## character(0)

If we want the genes that are in at least two vectors:

L <- list(a, b, c, d, e)
u <- unlist(lapply(L, unique)) # or:  Reduce(c, lapply(L, unique))

tab <- table(u)
names(tab[tab > 1])
## [1] "geneA" "geneB" "geneC" "geneE"

sort(unique(u[duplicated(u)]))
## [1] "geneA" "geneB" "geneC" "geneE"

Note: We used:

a <- c("geneA","geneB")
b <- c("geneA","geneC")
c <- c("geneD","geneE")
d <- c("geneA","geneE")
e <- c("geneB","geneC")

answered Oct 23 '22 23:10

G. Grothendieck

Related questions
                            
                                Closest equivalent of a factor variable in Python Pandas
                            
                                Align geom_text to a geom_vline in ggplot2
                            
                                What is the default font for ggplot2
                            
                                efficiently generate a random sample of times and dates between two dates
                            
                                Filtering observations in dplyr in combination with grepl
                            
                                plotting pie graphs on map in ggplot
                            
                                Change ggplot factor colors
                            
                                How to disable stringsAsFactors=TRUE in data.frame permanently?
                            
                                How to write to clipboard on Ubuntu/Linux in R?
                            
                                ggplot2 0.9.0 automatically dropping unused factor levels from plot legend?
                            
                                R Script - How to Continue Code Execution on Error
                            
                                How to reverse the order of a dataframe in R
                            
                                Getting OVER QUERY LIMIT after one request with geocode
                            
                                How to get a .csv file into R?
                            
                                How to change order of array dimensions
                            
                                What is integer overflow in R and how can it happen?
                            
                                How to access single elements in a table in R
                            
                                Fixing set.seed for an entire session
                            
                                How do you change the default directory in RStudio (or R)?
                            
                                R shiny: How to get an reactive data frame updated each time pressing an actionButton without creating a new reactive data frame?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With