I have a question about the Reduce function in R. I read its documentation, but I am still confused a bit. So, I have 5 vectors with genes name. For example:
v1 <- c("geneA","geneB",""...)
v2 <- c("geneA","geneC",""...)
v3 <- c("geneD","geneE",""...)
v4 <- c("geneA","geneE",""...)
v5 <- c("geneB","geneC",""...)
And I would like to find out which genes are present in at least two vectors. Some people have suggested:
Reduce(intersect,list(a,b,c,d,e))
I would greatly appreciate if someone could please explain to me how this statement works, because I have seen Reduce used in other scenarios.
The reduce() method executes a reducer function for array element. The reduce() method returns a single value: the function's accumulated result. The reduce() method does not execute the function for empty array elements. The reduce() method does not change the original array.
Reduce comes with some terminology such as reducer & accumulator. The accumulator is the value that we end with and the reducer is what action we will perform in order to get to one value. You must remember that a reducer will only return one value and one value only hence the name reduce.
reduce() method in JavaScript is used to reduce the array to a single value and executes a provided function for each value of the array (from left-to-right) and the return value of the function is stored in an accumulator. Syntax: array.reduce( function(total, currentValue, currentIndex, arr), initialValue )
The first argument to Python's reduce() is a two-argument function conveniently called function . This function will be applied to the items in an iterable to cumulatively compute a final value.
Reduce
takes a binary function and a list of data items and successively applies the function to the list elements in a recursive fashion. For example:
Reduce(intersect,list(a,b,c))
is the same as
intersect((intersect(a,b),c)
However, I don't think that construct will help you here as it will only return those elements that are common to all vectors.
To count the number of vectors that a gene appears in you could do the following:
vlist <- list(v1,v2,v3,v4,v5)
addmargins(table(gene=unlist(vlist), vec=rep(paste0("v",1:5),times=sapply(vlist,length))),2,list(Count=function(x) sum(x[x>0])))
vec
gene v1 v2 v3 v4 v5 Count
geneA 1 1 0 1 0 3
geneB 1 0 0 0 1 2
geneC 0 1 0 0 1 2
geneD 0 0 1 0 0 1
geneE 0 0 1 1 0 2
A nice way to see what Reduce()
is doing is to run it with its argument accumulate=TRUE
. When accumulate=TRUE
, it will return a vector or list in which each element shows its state after processing the first n elements of the list in x
. Here are a couple of examples:
Reduce(`*`, x=list(5,4,3,2), accumulate=TRUE)
# [1] 5 20 60 120
i2 <- seq(0,100,by=2)
i3 <- seq(0,100,by=3)
i5 <- seq(0,100,by=5)
Reduce(intersect, x=list(i2,i3,i5), accumulate=TRUE)
# [[1]]
# [1] 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36
# [20] 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74
# [39] 76 78 80 82 84 86 88 90 92 94 96 98 100
#
# [[2]]
# [1] 0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96
#
# [[3]]
# [1] 0 30 60 90
Assuming the input values given at the end of this answer, the expression
Reduce(intersect,list(a,b,c,d,e))
## character(0)
gives the genes that are present in all vectors, not the genes that are present in at least two vectors. It means:
intersect(intersect(intersect(intersect(a, b), c), d), e)
## character(0)
If we want the genes that are in at least two vectors:
L <- list(a, b, c, d, e)
u <- unlist(lapply(L, unique)) # or: Reduce(c, lapply(L, unique))
tab <- table(u)
names(tab[tab > 1])
## [1] "geneA" "geneB" "geneC" "geneE"
or
sort(unique(u[duplicated(u)]))
## [1] "geneA" "geneB" "geneC" "geneE"
Note: We used:
a <- c("geneA","geneB")
b <- c("geneA","geneC")
c <- c("geneD","geneE")
d <- c("geneA","geneE")
e <- c("geneB","geneC")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With