Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understand the `Reduce` function

Tags:

r

reduce

I have a question about the Reduce function in R. I read its documentation, but I am still confused a bit. So, I have 5 vectors with genes name. For example:

v1 <- c("geneA","geneB",""...)
v2 <- c("geneA","geneC",""...)
v3 <- c("geneD","geneE",""...)
v4 <- c("geneA","geneE",""...)
v5 <- c("geneB","geneC",""...)

And I would like to find out which genes are present in at least two vectors. Some people have suggested:

Reduce(intersect,list(a,b,c,d,e))

I would greatly appreciate if someone could please explain to me how this statement works, because I have seen Reduce used in other scenarios.

like image 783
Johnathan Avatar asked Feb 16 '15 16:02

Johnathan


People also ask

What is the function of reduce?

The reduce() method executes a reducer function for array element. The reduce() method returns a single value: the function's accumulated result. The reduce() method does not execute the function for empty array elements. The reduce() method does not change the original array.

Why is reduce function called reduce?

Reduce comes with some terminology such as reducer & accumulator. The accumulator is the value that we end with and the reducer is what action we will perform in order to get to one value. You must remember that a reducer will only return one value and one value only hence the name reduce.

What is reduce () in JavaScript?

reduce() method in JavaScript is used to reduce the array to a single value and executes a provided function for each value of the array (from left-to-right) and the return value of the function is stored in an accumulator. Syntax: array.reduce( function(total, currentValue, currentIndex, arr), initialValue )

What is the first argument of reduce function?

The first argument to Python's reduce() is a two-argument function conveniently called function . This function will be applied to the items in an iterable to cumulatively compute a final value.


Video Answer


3 Answers

Reduce takes a binary function and a list of data items and successively applies the function to the list elements in a recursive fashion. For example:

Reduce(intersect,list(a,b,c))

is the same as

intersect((intersect(a,b),c)

However, I don't think that construct will help you here as it will only return those elements that are common to all vectors.

To count the number of vectors that a gene appears in you could do the following:

vlist <- list(v1,v2,v3,v4,v5)
addmargins(table(gene=unlist(vlist), vec=rep(paste0("v",1:5),times=sapply(vlist,length))),2,list(Count=function(x) sum(x[x>0])))
       vec
gene    v1 v2 v3 v4 v5 Count
  geneA  1  1  0  1  0     3
  geneB  1  0  0  0  1     2
  geneC  0  1  0  0  1     2
  geneD  0  0  1  0  0     1
  geneE  0  0  1  1  0     2
like image 108
James Avatar answered Oct 24 '22 00:10

James


A nice way to see what Reduce() is doing is to run it with its argument accumulate=TRUE. When accumulate=TRUE, it will return a vector or list in which each element shows its state after processing the first n elements of the list in x. Here are a couple of examples:

Reduce(`*`, x=list(5,4,3,2), accumulate=TRUE)
# [1]   5  20  60 120

i2 <- seq(0,100,by=2)
i3 <- seq(0,100,by=3)
i5 <- seq(0,100,by=5)
Reduce(intersect, x=list(i2,i3,i5), accumulate=TRUE)
# [[1]]
#  [1]   0   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36
# [20]  38  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74
# [39]  76  78  80  82  84  86  88  90  92  94  96  98 100
# 
# [[2]]
#  [1]  0  6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96
# 
# [[3]]
# [1]  0 30 60 90
like image 27
Josh O'Brien Avatar answered Oct 23 '22 23:10

Josh O'Brien


Assuming the input values given at the end of this answer, the expression

Reduce(intersect,list(a,b,c,d,e))
## character(0)

gives the genes that are present in all vectors, not the genes that are present in at least two vectors. It means:

intersect(intersect(intersect(intersect(a, b), c), d), e)
## character(0)

If we want the genes that are in at least two vectors:

L <- list(a, b, c, d, e)
u <- unlist(lapply(L, unique)) # or:  Reduce(c, lapply(L, unique))

tab <- table(u)
names(tab[tab > 1])
## [1] "geneA" "geneB" "geneC" "geneE"

or

sort(unique(u[duplicated(u)]))
## [1] "geneA" "geneB" "geneC" "geneE"

Note: We used:

a <- c("geneA","geneB")
b <- c("geneA","geneC")
c <- c("geneD","geneE")
d <- c("geneA","geneE")
e <- c("geneB","geneC")
like image 8
G. Grothendieck Avatar answered Oct 23 '22 23:10

G. Grothendieck