Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to call a function for each row of a data.frame?

I have a function with several paramters. This function returns a data.frame.

I have another data.frame.

Now I would like to call my function for each row of my data.frame (as parameters). The resulting data.frames I would like to rbind.

So I thought something like

do.call(rbind, apply(df, 1, f))

is my friend.

But: During this call df gets converted to a matrix. In this process all numbers are converted to characters. So I have to modify my function to reconvert. That's clumsy and I'm afraid I miss something.

So my question is, how can I do this?

As example see the following code:

Sys.setenv(LANG = "en")
# Create data.frame
df <- data.frame(
  a = c('a', 'b', 'c'),
  b = c(1, 2, 3),
  stringsAsFactors = FALSE
)

# My function 
f <- function(x) {
  data.frame(
    x = rep(paste(rep(x[['a']], x[['b']]), collapse=''),x[['b']]),
    y = 2 * x[['b']],
    stringsAsFactors = FALSE
  )
}

apply(df, 1, f)

Here I get the error:

Error in 2 * x[["b"]] : non-numeric argument to binary operator 

So I change function f to function g:

g <- function(x) {
  data.frame(
    x = rep(paste(rep(x[['a']], as.numeric(x[['b']])), collapse=''), as.numeric(x[['b']])),
    y = 2 * as.numeric(x[['b']]),
    stringsAsFactors = FALSE
  )
}

Now I can call

 do.call(rbind, apply(df, 1, g))

and I get

    x y
1   a 2
2  bb 4
3  bb 4
4 ccc 6
5 ccc 6
6 ccc 6

I tried to use a for-loop.

result <- f(df[1,])
for(i in 2:nrow(df)){
  result <- rbind(result, f(df[i,]))
}
result

That does work. But this can't be the R-way. for-loops aren't "R-ish" There's too much what can go wrong. Perhaps df can be empty or does only have one row.

So what's the base-R or dplyr/tidyverse solution?

like image 726
JerryWho Avatar asked Jan 10 '18 17:01

JerryWho


2 Answers

Well, apply() is meant for matrices and doesn't play with with data.frames. It really should be avoided in cases like these. It's better to write functions that take proper parameters rather than require passing data.frame rows.

f <- function(a, b) {
  data.frame(
    x = rep(paste(rep(a, b), collapse=''), b),
    y = 2 * b,
    stringsAsFactors = FALSE
  )
}

Then you can use a more conventional map() style approach (especially easy if using just two columns)

purrr::map2_df(df$a, df$b, f)

With more columns, (and column names that match the parameter names), you can use

purrr::pmap_df(df, f)
like image 151
MrFlick Avatar answered Sep 28 '22 04:09

MrFlick


I believe you can do this quite cleanly in data.table:

library(data.table)
setDT(df)
df[ , .(x = rep(paste(rep(a, b), collapse = ''), b), y = 2*b), 
   keyby = seq_len(nrow(df))]
#    seq_len   x y
# 1:       1   a 2
# 2:       2  bb 4
# 3:       2  bb 4
# 4:       3 ccc 6
# 5:       3 ccc 6
# 6:       3 ccc 6

The keyby = seq_len(nrow(df)) part is the clunkiest bit; this in particular is the subject of a few enhancement requests for data.table, e.g., #1063

like image 42
MichaelChirico Avatar answered Sep 28 '22 03:09

MichaelChirico