I have a function with several paramters. This function returns a data.frame.
I have another data.frame.
Now I would like to call my function for each row of my data.frame (as parameters). The resulting data.frames I would like to rbind.
So I thought something like
do.call(rbind, apply(df, 1, f))
is my friend.
But: During this call df gets converted to a matrix. In this process all numbers are converted to characters. So I have to modify my function to reconvert. That's clumsy and I'm afraid I miss something.
So my question is, how can I do this?
As example see the following code:
Sys.setenv(LANG = "en")
# Create data.frame
df <- data.frame(
a = c('a', 'b', 'c'),
b = c(1, 2, 3),
stringsAsFactors = FALSE
)
# My function
f <- function(x) {
data.frame(
x = rep(paste(rep(x[['a']], x[['b']]), collapse=''),x[['b']]),
y = 2 * x[['b']],
stringsAsFactors = FALSE
)
}
apply(df, 1, f)
Here I get the error:
Error in 2 * x[["b"]] : non-numeric argument to binary operator
So I change function f to function g:
g <- function(x) {
data.frame(
x = rep(paste(rep(x[['a']], as.numeric(x[['b']])), collapse=''), as.numeric(x[['b']])),
y = 2 * as.numeric(x[['b']]),
stringsAsFactors = FALSE
)
}
Now I can call
do.call(rbind, apply(df, 1, g))
and I get
x y
1 a 2
2 bb 4
3 bb 4
4 ccc 6
5 ccc 6
6 ccc 6
I tried to use a for-loop.
result <- f(df[1,])
for(i in 2:nrow(df)){
result <- rbind(result, f(df[i,]))
}
result
That does work. But this can't be the R-way. for-loops aren't "R-ish" There's too much what can go wrong. Perhaps df can be empty or does only have one row.
So what's the base-R or dplyr/tidyverse solution?
Well, apply()
is meant for matrices and doesn't play with with data.frames. It really should be avoided in cases like these. It's better to write functions that take proper parameters rather than require passing data.frame rows.
f <- function(a, b) {
data.frame(
x = rep(paste(rep(a, b), collapse=''), b),
y = 2 * b,
stringsAsFactors = FALSE
)
}
Then you can use a more conventional map()
style approach (especially easy if using just two columns)
purrr::map2_df(df$a, df$b, f)
With more columns, (and column names that match the parameter names), you can use
purrr::pmap_df(df, f)
I believe you can do this quite cleanly in data.table
:
library(data.table)
setDT(df)
df[ , .(x = rep(paste(rep(a, b), collapse = ''), b), y = 2*b),
keyby = seq_len(nrow(df))]
# seq_len x y
# 1: 1 a 2
# 2: 2 bb 4
# 3: 2 bb 4
# 4: 3 ccc 6
# 5: 3 ccc 6
# 6: 3 ccc 6
The keyby = seq_len(nrow(df))
part is the clunkiest bit; this in particular is the subject of a few enhancement requests for data.table
,
e.g., #1063
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With