Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to put an apply equivalent to any for loop

Tags:

r

apply

Most pro R users have advised me never to use loops in R. Use apply functions instead. The problem is that it is not that intuitive to write an apply equivalent for every for/while loop if you're not familiar with functional programming. Take the below example for instance.

F <- data.frame(name = c("a", "b", "c", "d"), var1 = c(1,0,0,1), var2 = c(0,0,1,1),  
var3 = c(1,1,1,1), clus = c("one", "two", "three", "four"))
F$ObjTrim <- ""
for (i in 1:nrow(F))
{
 for (j in 2:(ncol(F)-1))
{
 if(F[i, j] == 1) 
 {F$ObjTrim[i]  <- paste(F$ObjTrim[i], colnames(F)[j], sep = " ") }

 }
  print(i)
}

The objective here is to create a variable "ObjTrim" that takes the value of all the column names that have a value == 1. Can some one suggest a good apply equivalent to this?

The code above for example will give :

 name var1 var2 var3  clus         ObjTrim
1    a    1    0    1   one       var1 var3
2    b    0    0    1   two            var3
3    c    0    1    1 three       var2 var3
4    d    1    1    1  four  var1 var2 var3

Thanks!

like image 608
Shreyes Avatar asked Jun 09 '13 07:06

Shreyes


3 Answers

Here you can avoid for loops using vectorization: colSums is vectorized and is basically used here to convert a vector c(TRUE,FALSE) to 0 or 1.

 colnames(F)[colSums(F==1) != 0] ## create 

Here is a test using my reproducible example:

set.seed(1234)
## create matrix 2*10
F <- matrix(sample(c(1:5),20,rep=TRUE),nrow=2,
            dimnames = list(c('row1','row2'),paste0('col',1:10)))

#        col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
# row1    1    4    5    1    4    4    2    2    2     1
# row2    4    4    4    2    3    3    5    5    2     2
colnames(F)[colSums(F==1) != 0]
"col1"  "col4"  "col10"

PS: Generally it is easy to replace for loops by an "R style solution", but there are some cases where it is difficult/impossible to do that specially when there is recursion.

EDIT

After OP's clarification , here is an apply solution :

F$ObjTrim <- apply(F,1,function(x) paste(colnames(F)[x==1],collapse=' '))

 name var1 var2 var3  clus        ObjTrim
1    a    1    0    1   one      var1 var3
2    b    0    0    1   two           var3
3    c    0    1    1 three      var2 var3
4    d    1    1    1  four var1 var2 var3
like image 189
agstudy Avatar answered Oct 19 '22 21:10

agstudy


As your comment to @agstudy's answer says that you do want this for each row, maybe this helps you:

df <- F [, 2:4]
df
#   var1 var2 var3
# 1    1    0    1
# 2    0    0    1
# 3    0    1    1
# 4    1    1    1

ones <- which (df == 1, arr.ind=TRUE)
ones
#      row col
# [1,]   1   1
# [2,]   4   1
# [3,]   3   2
# [4,]   4   2
# [5,]   1   3
# [6,]   2   3
# [7,]   3   3
# [8,]   4   3

This you can aggregate by row:

aggregate (col ~ row, ones, paste)
#   row     col
# 1   1    1, 3
# 2   2       3
# 3   3    2, 3
# 4   4 1, 2, 3

If you insist on having the colnames instead of indices, replace the cols in ones first:

ones <- as.data.frame (ones) 
ones$col <- colnames (df)[ones$col]
aggregate (col ~ row, ones, paste)
#   row              col
# 1   1       var1, var3
# 2   2             var3
# 3   3       var2, var3
# 4   4 var1, var2, var3

Of course, you could also use apply along the rows:

apply (df, 1, function (x) paste (colnames (df) [x == 1], collapse = " "))
# [1] "var1 var3"       "var3"             "var2 var3"       "var1 var2 var3"

For your problem, vectorized functions exist so neither for loops nor apply are needed.

However, there are cases where for loops are the clearer (faster to read) and sometimes also the faster to compute alternative. This is particularly then the case when looping a few times allows to use vectorized functions and save applying some other function over a large margin.

like image 42
cbeleites unhappy with SX Avatar answered Oct 19 '22 21:10

cbeleites unhappy with SX


To answer what seems to be your generic question instead of the example you cited --- how to convert a for loop into an apply variant --- the following may be a few useful pointers:

  1. Consider the structure of the object that you are iterating over. There may be different types, for example:

    a) Elements of a vector / matrix. b) Rows / Columns of a matrix. c) A dimension of a higher dimensional array. d) Elements of a list (which within themselves may be one of the objects cited above). e) Corresponding elements of multiple lists / vectors.

    In each case, the function you employ may be slightly different but the strategy to use is the same. Moreover, learn the apply family. The various *pply functions are based on similar abstraction but differ in what they take as input and what they throw as output.

  2. In the above case-list, for example.

    a) Elements of a vector: Look for already existing vectorized solutions (as given above) which are a core strength in R. On top of that consider matrix algebra. Most problems that seem to require loops (or nested loops) can be written as equations in matrix algebra.

    b) Rows / Columns of a matrix: Use apply. Use the correct value for the MARGIN argument. Similary for c) for higher dimensional arrays.

    d) Use an lapply. If the output you return is a 'simple' structure (a scalar or a vector), you may consider sapply which is simply simplify2array(lapply(...)) and returns an array in the appropriate dimensions.

    e) Use mapply. The 'm' can stand for multivariate apply.

  3. Once you have understood the object you are iterating over and the corresponding tool, simplify your problem. Think not of the overall object you are iterating over but one instance of it. For example when iterating over rows of a matrix, forget about the matrix and remember only the row.

    Now, write a function (or a lambda) that operates on only the one instance (element) of your iterand and simply `apply' it using the correct member of the *pply family.

Now, let's take your example problem to use this strategy and replicate the clean solution given by @agstudy.

  1. The first thing to identify is that you are iterating over the rows of the matrix. Clearly, you understand this as your looping solution starts with for (i in 1:nrow(F)).

  2. Identify apply as your friend.

  3. Understand what you need to do with this row. First of all you want to find out which values are 1. Then you need to find the colnames of these values. And then find a way to concatenate these colnames. If I may take the liberty of rewriting @agstudy's solution to help explain:

    process.row <- function (arow) {
      ones <- arow == 1 # Returns logical vector.
      cnames <- colnames[ones] # Logical subsetting.
      cnames <- paste(cnames, collapse=' ') # Paste the names together.
      cnames # Return
    }
    

    And you get the solution:

    F$ObjTrim = apply(X=F, MARGIN=1, FUN=process.row)
    

    Then, when thinking like this becomes instinctive, you can roll out use R's capability to write dense expressions such as:

    F$ObjTrim = apply(F,1,function(x) paste(colnames(F)[x==1],collapse=' '))
    

which uses a 'lambda' rolled on-the-fly to get the job done.

like image 35
asb Avatar answered Oct 19 '22 22:10

asb