Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply function to every value in an R dataframe

Tags:

r

I have a 58 column dataframe, I need to apply the transformation $log(x_{i,j}+1)$ to all values in the first 56 columns. What method could I use to go about this most efficiently? I'm assuming there is something that would allow me to do this rather than just using some for loops to run through the entire dataframe.

like image 902
Hoser Avatar asked Mar 05 '13 04:03

Hoser


People also ask

How do you apply a function to all elements in a DataFrame?

The apply() function is used to apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame's index (axis=0) or the DataFrame's columns (axis=1).

How do I apply a function to each column in a DataFrame in R?

Apply any function to all R data frame You can set the MARGIN argument to c(1, 2) or, equivalently, to 1:2 to apply the function to each value of the data frame. If you set MARGIN = c(2, 1) instead of c(1, 2) the output will be the same matrix but transposed. The output is of class “matrix” instead of “data.

What is apply () in R?

The apply() function lets us apply a function to the rows or columns of a matrix or data frame. This function takes matrix or data frame as an argument along with function and whether it has to be applied by row or column and returns the result in the form of a vector or array or list of values obtained.

What is the syntax of the Apply () in R explain with example?

apply() functionapply() takes Data frame or matrix as an input and gives output in vector, list or array. Apply function in R is primarily used to avoid explicit uses of loop constructs. It is the most basic of all collections can be used over a matrice. The simplest example is to sum a matrice over all the columns.


2 Answers

alexwhan's answer is right for log (and should probably be selected as the correct answer). However, it works so cleanly because log is vectorized. I have experienced the special pain of non-vectorized functions too frequently. When I started with R, and didn't understand the apply family well, I resorted to ugly loops very often. So, for the purposes of those who might stumble onto this question who do not have vectorized functions I provide the following proof of concept.

#Creating sample data df <- as.data.frame(matrix(runif(56 * 56), 56, 56)) #Writing an ugly non-vectorized function logplusone <- function(x) {log(x[1] + 1)} #example code that achieves the desired result, despite the lack of a vectorized function df[, 1:56] <- as.data.frame(lapply(df[, 1:56], FUN = function(x) {sapply(x, FUN = logplusone)})) #Proof that the results are the same using both methods...  #Note: I used all.equal rather than all so that the values are tested using machine tolerance for mathematical equivalence.  This is probably a non-issue for the current example, but might be relevant with some other testing functions. #should evaluate to true all.equal(log(df[, 1:56] + 1),as.data.frame(lapply(df[, 1:56], FUN = function(x) {sapply(x, FUN = logplusone)})))  
like image 138
russellpierce Avatar answered Sep 24 '22 23:09

russellpierce


You should be able to just refer to the columns you want, and do the operation, ie:

df[,1:56] <- log(df[,1:56]+1)
like image 20
alexwhan Avatar answered Sep 21 '22 23:09

alexwhan