Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Code Building Process and Embedded Functions

Tags:

r

process

First this question isn't about trying to solve a specific problem. As newcomer to R I'm also working to create more efficient code and code building procedures. Getting perspectives on different programming methods and even styles is the reason behind this question.

Below are three ways to code something:

First here is the example data:

stackexample <- c(52,50,45,49.5,50.5,12,10,14,11.5,12,110,108,106,101,104)
dim(stackexample)<- c(5,3)

Method One: Do the math in the function without defining any objects

 ertimesIVCV1 <- function (x) 
{ (solve(var(log((x[-nrow(x),])/(x[-1,])))))%*%
  ((1+(log(x[1,]/(x)[nrow(x),])))^(1/nrow(x))-1)}

ertimesIVCV1(stackexample)

Method Two: Define Objects in the function and then manipulate those objects

    ertimesIVCV2 <- function (x) 
{ IVCV <- solve(var(log((x[-nrow(x),])/(x[-1,]))));
  retsexcess <- (1+(log(x[1,]/(x)[nrow(x),])))^(1/nrow(x))-1;
  IVCV%*%retsexcess}

ertimesIVCV2(stackexample)

Method Three: Define Several Functions and call those functions in "summary like" function

IVCV <- function (x) {solve(var(log((x[-nrow(x),])/(x[-1,]))))}
retsexcess <- function(x) (1+(log(x[1,]/(x)[nrow(x),])))^(1/nrow(x))-1
ertimesIVCV3 <- function (x) {IVCV(x)%*%retsexcess(x)}

ertimesIVCV3(stackexample)

So all produce the same answer:

           [,1]
[1,]  1.4430104
[2,] -0.1365155
[3,] 11.8088378

but as you can see three different approaches.

Is there such a thing as an optimal number of embedded functions or should we always try to explicitly list all the math out? How many levels of functions within functions is optimal? Is either method superior in computational speed? Is there a rule of thumb to this? How do you approach this? Any comments or suggestions or links would be welcome and thank you!

Rye

like image 401
rsgmon Avatar asked Feb 28 '12 19:02

rsgmon


1 Answers

IMHO, speed efficiency should be the last of your concerns when writing code, especially if you are a beginner. Instead, your primary focus should be about simplicity, readability, modularity. Don't read me wrong, efficiency is a great thing, and you'll find many ways to make your code faster when needed, but it should not be a priority by itself.

So I'll be giving tips about style mostly. To illustrate, here is what my version of your code would look like. Please bear in mind that I do not know what your code is computing so I did my best in trying to break it using meaningful variable names.

IVCV <- function(stack) {

## This function computes [...] IVCV stands for [...]
## Inputs:
##    - stack: a matrix where each column [...]
## Output: a matrix [...]

   n <- nrow(stack) # stack size
   stack.ratios  <- stack[-n, ] / stack[-1, ]
   log.ratios    <- log(stack.ratios)
   ivcv          <- solve(var(log.ratios))

   return(ivcv)
}

ExcessReturn <- function(stack) {

## This function computes [...] IVCV stands for [...]
## Inputs:
##    - stack: a matrix where each column [...]
## Output: a matrix [...]

   n <- nrow(stack) # stack size
   total.ratio   <- stack[1, ] / stack[n, ]
   excess.return <- (1 + log(total.ratio)) ^ (1 / n) - 1

   return(excess.return)
}

ExcessReturnTimesIVCV <- function(stack) {

## This function computes [...] IVCV stands for [...]
## Inputs:
##    - stack: a matrix where each column [...]
## Output: a vector [...]

    return(IVCV(stack) %*% ExcessReturn(stack))
}

1) yes, break your code into small functions. It is better for readability, flexibility, and maintenance. It also makes unit testing easier, where you can design tests for each elementary piece of code.

2) document a function by including comments about its description/inputs/output inside the body of the function. This way, after the function is created, the user can see its description as part of the function's printout (e.g., just type ExcessReturnTimesIVCV in the GUI).

3) break out complexity into multiple statements. Right now, all of your three suggestions are hard to understand, with too many things going on on each line. A statement should do a simple thing so it can read easily. Creating more objects is unlikely to slow down your process, and it will make debugging much easier.

4) your object names are key to making your code clear. Choose them well and use a consistent syntax. I use UpperCamelCase for my own functions' names, and lowercase words separated with dots for most other objects.

5) put comments, especially where 3) and 4) are not enough to make the code clear. In my example, I chose to use a variable n. I went against the recommendation that variable names should be descriptive, but it was to make the code a little lighter and give expressions like stack[-n, ] / stack[-1, ] some nice symmetry. Since n is a bad name, I put a comment explaining its meaning. I might also have put more comments in the code if I knew what the functions were really doing.

6) Use consistent syntax rules, mostly to improve readability. You'll hear different opinions about what should be used here. In general, there is not one best approach. The most important thing is to make a choice and stick with it. So here are my suggestions:

a) one statement per line, no semi colons.

b) consistent spacing and indentation (no tabs). I put spaces after commas, around binary operators. I also use extra spacing to line up things if it helps readability.

c) consistent bracing : be careful of the way you are using curly brackets to define blocks, otherwise you are likely to get problems in script mode. See Section 8.1.43 of the R Inferno (a great reference.)

Good luck!

like image 83
flodel Avatar answered Oct 06 '22 02:10

flodel