Use apply() function when you wanted to update every row in pandas DataFrame by calling a custom function. In order to apply a function to every row, you should use axis=1 param to apply().
The apply() collection is a part of R essential package. This family of functions helps us to apply a certain function to a certain data frame, list, or vector and return the result as a list or vector depending on the function we use.
DataFrame - apply() function. The apply() function is used to apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame's index (axis=0) or the DataFrame's columns (axis=1).
You can apply apply
to a subset of the original data.
dat <- data.frame(x=c(1,2), y=c(3,4), z=c(5,6))
apply(dat[,c('x','z')], 1, function(x) sum(x) )
or if your function is just sum use the vectorized version:
rowSums(dat[,c('x','z')])
[1] 6 8
If you want to use testFunc
testFunc <- function(a, b) a + b
apply(dat[,c('x','z')], 1, function(x) testFunc(x[1],x[2]))
EDIT To access columns by name and not index you can do something like this:
testFunc <- function(a, b) a + b
apply(dat[,c('x','z')], 1, function(y) testFunc(y['z'],y['x']))
A data.frame
is a list
, so ...
For vectorized functions do.call
is usually a good bet. But the names of arguments come into play. Here your testFunc
is called with args x and y in place of a and b. The ...
allows irrelevant args to be passed without causing an error:
do.call( function(x,z,...) testFunc(x,z), df )
For non-vectorized functions, mapply
will work, but you need to match the ordering of the args or explicitly name them:
mapply(testFunc, df$x, df$z)
Sometimes apply
will work - as when all args are of the same type so coercing the data.frame
to a matrix does not cause problems by changing data types. Your example was of this sort.
If your function is to be called within another function into which the arguments are all passed, there is a much slicker method than these. Study the first lines of the body of lm()
if you want to go that route.
Use mapply
> df <- data.frame(x=c(1,2), y=c(3,4), z=c(5,6))
> df
x y z
1 1 3 5
2 2 4 6
> mapply(function(x,y) x+y, df$x, df$z)
[1] 6 8
> cbind(df,f = mapply(function(x,y) x+y, df$x, df$z) )
x y z f
1 1 3 5 6
2 2 4 6 8
dplyr
packageIf the function that you want to apply is vectorized,
then you could use the mutate
function from the dplyr
package:
> library(dplyr)
> myf <- function(tens, ones) { 10 * tens + ones }
> x <- data.frame(hundreds = 7:9, tens = 1:3, ones = 4:6)
> mutate(x, value = myf(tens, ones))
hundreds tens ones value
1 7 1 4 14
2 8 2 5 25
3 9 3 6 36
plyr
packageIn my humble opinion,
the tool best suited to the task is mdply
from the plyr
package.
Example:
> library(plyr)
> x <- data.frame(tens = 1:3, ones = 4:6)
> mdply(x, function(tens, ones) { 10 * tens + ones })
tens ones V1
1 1 4 14
2 2 5 25
3 3 6 36
Unfortunately, as Bertjan Broeksema pointed out,
this approach fails if you don't use all the columns of the data frame
in the mdply
call.
For example,
> library(plyr)
> x <- data.frame(hundreds = 7:9, tens = 1:3, ones = 4:6)
> mdply(x, function(tens, ones) { 10 * tens + ones })
Error in (function (tens, ones) : unused argument (hundreds = 7)
Others have correctly pointed out that mapply
is made for this purpose, but (for the sake of completeness) a conceptually simpler method is just to use a for
loop.
for (row in 1:nrow(df)) {
df$newvar[row] <- testFunc(df$x[row], df$z[row])
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With