I have data for different tissues like so
tissueA tissueB tissueC
gene1 4.5 6.2 5.8
gene2 3.2 4.7 6.6
And I want to calculate a summary statistic that is
x = Σ [1-log2(i,j)/log2(i,max)]/n-1
where n is the number of tissues (here it is 3), (i,max) is the highest value for the gene i across the n tissues, (i.e for gene1 it is 6.2).
Since I have to do this calculation for each tissue for every gene (as the sum goes from j to n, and j=1) and then get the sum of that
I wrote a for loop
for (i in seq_along(x) {
my.max <- max(x[,i])
my.statistic <- (1-log2(x[,i]/log2[my.max])
my.sum <- sum(my.statistic)
my.answer <- my.sum/2 #(n-1=3-1=2)
however I am not sure how to apply this for loop for each row, normally I would write a function and just do (apply,1,function(x)) but I am not sure how a for loop can be turned into a function.
For expected output for gene1, for example, it would be
(1-log2(4.5)/log2(6.2))/2 + (1-log2(5.8)/log2(6.2))/2 =0.1060983
"function" as a keyword is only used for defining functions, and cannot be used inside a loop (or inside an "if" or "switch" or other control statement.) The only kinds of functions that can be defined within loops are anonymous functions.
A for-loop is one of the main control-flow constructs of the R programming language. It is used to iterate over a collection of objects, such as a vector, a list, a matrix, or a dataframe, and apply the same set of operations on each item of a given data structure.
In computer science a for-loop or for loop is a control flow statement for specifying iteration. Specifically, a for loop functions by running a section of code repeatedly until a certain condition has been satisfied.
Each function definition can only have one loop. A function that contains a loop must not also use recursion. Look at the following example. Notice that statement finish(i+1) is only done at the end of the last iteration of the loop.
Just in case if you have a huge data set, you can use plyr's adply()
which is faster compared to apply()
library(plyr)
adply(df, 1, function(x)
data.frame( my.stat = sum(1-log2((x[,x != max(x)]))/log2(max(x))) / (length(x)-1)))
#tissueA tissueB tissueC my.stat
#1 4.5 6.2 5.8 0.1060983
#2 3.2 4.7 6.6 0.2817665
Try this:
#data
df <- read.table(text=" tissueA tissueB tissueC
gene1 4.5 6.2 5.8
gene2 3.2 4.7 6.6")
#result
apply(df,1,function(i){
my.max <- max(i)
my.statistic <-
(1-log2(i)/log2(my.max))
my.sum <- sum(my.statistic)
my.answer <- my.sum/(length(i)-1)
my.answer
})
#result
# gene1 gene2
# 0.1060983 0.2817665
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With