Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A basic R function

Tags:

r

In reading R for programmers I saw this function

oddcount <- function(x) {
  k <- 0
  for (n in x) {
    if (n %% 2 == 1) k <- k+1
  }
  return(k)
}

I would prefer to write it in a simpler style (i.e in lisp)

(defn odd-count [xs]
  (count (filter odd? xs)))

I see the function length is equivalent to count and I can write odd? so are there built-in map/filter/remove type functions?

like image 544
ChrisR Avatar asked Jul 31 '12 11:07

ChrisR


2 Answers

In R, when you are working with vectors, people often prefer to work on the entire vector at once instead of looping through it (see, for example, this discussion).

In a sense, R does have "built in" filter and reduce functions: the way in which you can select subsets of a vector. They are very handy in R, and there are a few ways to go about it - I'll show you a couple, but you'll pick up more if you read about R and look at other people's code on a site like this. I would also consider looking at ?which and ?'[', which has more examples than I do here.

The first way is simply to select which elements you want. You can use this if you know the indices of the elements you want:

x <- letters[1:10]
> x
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

If we only want the first five letters, we can write:

x[1:5]
x[c(1,2,3,4,5)] # a more explicit version of the above

You can also select which elements you don't want by using a minus sign, for example:

 x[-(6:10)]

Another way to select elements is by using a boolean vector:

x <- 1:5
selection <- c(FALSE, TRUE, FALSE, TRUE, FALSE)
x[selection]   # only the second and fourth elements will remain

This is important because we can create such a vector by putting a vector in a comparison function:

selection <- (x > 3)
> selection
 [1] FALSE FALSE FALSE  TRUE  TRUE

x[selection]   # select all elements of x greater than 3
x[x > 3]       # a shorthand version of the above

Once again, we can select the opposite of the comparison we use (note that since it is boolean, we use ! and not -):

x[!(x > 3)]    # select all elements less than or equal to 3

If you want to do vector comparisons, you should consider the %in% function. For example:

x <- letters[1:10]
> x %in% c("d", "p", "e", "f", "y")
 [1] FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE

# Select all elements of x that are also "d", "p", "e", "f", or "y"
x[x %in% c("d", "p", "e", "f", "y")]  
# And to select everything not in that vector:
x[!(x %in% c("d", "p", "e", "f", "y"))]  

The above are only a few examples; I would definitely recommend the documentation. I know this is a long post after you have already accepted an answer, but this sort of thing is very important and understanding it is going to save you a lot of time and pain in the future if you are new to R, so I thought I'd share a couple of ways of doing it with you.

like image 181
Edward Avatar answered Oct 13 '22 00:10

Edward


A more R way to doing this would be to avoid the for loop, and use vectorization:

oddcount <- function(x) {
  sum(x %% 2)
}

The comparison between x and 2 outputs a vector as x itself is a vector. Sum than calculates the sum of the vector, where TRUE equals 1 and FALSE equals zero. In this way the function calculates the number of odd numbers in the vector.

This already leads to more simple syntax, although for non-vectorization-oriented people the for loop tends to be easier to read. I greatly prefer the vectorized syntax as it is much shorter. I would prefer to use a more descriptive name for x though, e.g. number_vector.

like image 22
Paul Hiemstra Avatar answered Oct 13 '22 01:10

Paul Hiemstra