Most pro R users have advised me never to use loops in R. Use apply functions instead. The problem is that it is not that intuitive to write an apply equivalent for every for/while loop if you're not familiar with functional programming. Take the below example for instance.
F <- data.frame(name = c("a", "b", "c", "d"), var1 = c(1,0,0,1), var2 = c(0,0,1,1),
var3 = c(1,1,1,1), clus = c("one", "two", "three", "four"))
F$ObjTrim <- ""
for (i in 1:nrow(F))
{
for (j in 2:(ncol(F)-1))
{
if(F[i, j] == 1)
{F$ObjTrim[i] <- paste(F$ObjTrim[i], colnames(F)[j], sep = " ") }
}
print(i)
}
The objective here is to create a variable "ObjTrim" that takes the value of all the column names that have a value == 1. Can some one suggest a good apply equivalent to this?
The code above for example will give :
name var1 var2 var3 clus ObjTrim
1 a 1 0 1 one var1 var3
2 b 0 0 1 two var3
3 c 0 1 1 three var2 var3
4 d 1 1 1 four var1 var2 var3
Thanks!
Here you can avoid for
loops using vectorization
: colSums
is vectorized and is basically used here to convert a vector c(TRUE,FALSE) to 0 or 1.
colnames(F)[colSums(F==1) != 0] ## create
Here is a test using my reproducible example:
set.seed(1234)
## create matrix 2*10
F <- matrix(sample(c(1:5),20,rep=TRUE),nrow=2,
dimnames = list(c('row1','row2'),paste0('col',1:10)))
# col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
# row1 1 4 5 1 4 4 2 2 2 1
# row2 4 4 4 2 3 3 5 5 2 2
colnames(F)[colSums(F==1) != 0]
"col1" "col4" "col10"
PS: Generally it is easy to replace for
loops by an "R style solution", but there are some cases where it is difficult/impossible to do that specially when there is recursion.
EDIT
After OP's clarification , here is an apply
solution :
F$ObjTrim <- apply(F,1,function(x) paste(colnames(F)[x==1],collapse=' '))
name var1 var2 var3 clus ObjTrim
1 a 1 0 1 one var1 var3
2 b 0 0 1 two var3
3 c 0 1 1 three var2 var3
4 d 1 1 1 four var1 var2 var3
As your comment to @agstudy's answer says that you do want this for each row, maybe this helps you:
df <- F [, 2:4]
df
# var1 var2 var3
# 1 1 0 1
# 2 0 0 1
# 3 0 1 1
# 4 1 1 1
ones <- which (df == 1, arr.ind=TRUE)
ones
# row col
# [1,] 1 1
# [2,] 4 1
# [3,] 3 2
# [4,] 4 2
# [5,] 1 3
# [6,] 2 3
# [7,] 3 3
# [8,] 4 3
This you can aggregate
by row:
aggregate (col ~ row, ones, paste)
# row col
# 1 1 1, 3
# 2 2 3
# 3 3 2, 3
# 4 4 1, 2, 3
If you insist on having the colnames instead of indices, replace the cols in ones
first:
ones <- as.data.frame (ones)
ones$col <- colnames (df)[ones$col]
aggregate (col ~ row, ones, paste)
# row col
# 1 1 var1, var3
# 2 2 var3
# 3 3 var2, var3
# 4 4 var1, var2, var3
Of course, you could also use apply
along the rows:
apply (df, 1, function (x) paste (colnames (df) [x == 1], collapse = " "))
# [1] "var1 var3" "var3" "var2 var3" "var1 var2 var3"
For your problem, vectorized functions exist so neither for
loops nor apply
are needed.
However, there are cases where for loops are the clearer (faster to read) and sometimes also the faster to compute alternative. This is particularly then the case when looping a few times allows to use vectorized functions and save apply
ing some other function over a large margin.
To answer what seems to be your generic question instead of the example you cited --- how to convert a for loop into an apply variant --- the following may be a few useful pointers:
Consider the structure of the object that you are iterating over. There may be different types, for example:
a) Elements of a vector / matrix. b) Rows / Columns of a matrix. c) A dimension of a higher dimensional array. d) Elements of a list (which within themselves may be one of the objects cited above). e) Corresponding elements of multiple lists / vectors.
In each case, the function you employ may be slightly different but the strategy to use is the same. Moreover, learn the apply family. The various *pply functions are based on similar abstraction but differ in what they take as input and what they throw as output.
In the above case-list, for example.
a) Elements of a vector: Look for already existing vectorized solutions (as given above) which are a core strength in R. On top of that consider matrix algebra. Most problems that seem to require loops (or nested loops) can be written as equations in matrix algebra.
b) Rows / Columns of a matrix: Use apply
. Use the correct value for the MARGIN
argument. Similary for c) for higher dimensional arrays.
d) Use an lapply
. If the output you return is a 'simple' structure (a scalar or a vector), you may consider sapply which is simply simplify2array(lapply(...))
and returns an array in the appropriate dimensions.
e) Use mapply
. The 'm' can stand for multivariate apply.
Once you have understood the object you are iterating over and the corresponding tool, simplify your problem. Think not of the overall object you are iterating over but one instance of it. For example when iterating over rows of a matrix, forget about the matrix and remember only the row.
Now, write a function (or a lambda) that operates on only the one instance (element) of your iterand and simply `apply' it using the correct member of the *pply family.
Now, let's take your example problem to use this strategy and replicate the clean solution given by @agstudy.
The first thing to identify is that you are iterating over the rows of the matrix. Clearly, you understand this as your looping solution starts with for (i in 1:nrow(F))
.
Identify apply
as your friend.
Understand what you need to do with this row. First of all you want to find out which values are 1. Then you need to find the colnames of these values. And then find a way to concatenate these colnames. If I may take the liberty of rewriting @agstudy's solution to help explain:
process.row <- function (arow) {
ones <- arow == 1 # Returns logical vector.
cnames <- colnames[ones] # Logical subsetting.
cnames <- paste(cnames, collapse=' ') # Paste the names together.
cnames # Return
}
And you get the solution:
F$ObjTrim = apply(X=F, MARGIN=1, FUN=process.row)
Then, when thinking like this becomes instinctive, you can roll out use R's capability to write dense expressions such as:
F$ObjTrim = apply(F,1,function(x) paste(colnames(F)[x==1],collapse=' '))
which uses a 'lambda' rolled on-the-fly to get the job done.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With