Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply function to a row in a data.frame using dplyr

In base R I would do the following:

d <- data.frame(a = 1:4, b = 4:1, c = 2:5)
apply(d, 1, which.max)

With dplyr I could do the following:

library(dplyr)
d %>% mutate(u = purrr::pmap_int(list(a, b, c), function(...) which.max(c(...))))

If there’s another column in d I need to specify it, but I want this to work w/ an arbitrary amount if columns.

Conceptually, I’d like something like

pmap_int(list(everything()), ...)
pmap_int(list(.), ...)

But this does obviously not work. How would I solve that canonically with dplyr?

like image 285
thothal Avatar asked Apr 03 '21 19:04

thothal


People also ask

How do I apply a function to each row in a Dataframe in R?

You can use the apply() function to apply a function to each row in a matrix or data frame in R.

How do I apply a function to a Dataframe in R?

In R Programming Language to apply a function to every integer type value in a data frame, we can use lapply function from dplyr package. And if the datatype of values is string then we can use paste() with lapply.

How do I apply a function across a column in R?

apply() lets you perform a function across a data frame's rows or columns. In the arguments, you specify what you want as follows: apply(X = data. frame, MARGIN = 1, FUN = function. you.

How do I apply a function to each column in a Dataframe in R?

Apply any function to all R data frame You can set the MARGIN argument to c(1, 2) or, equivalently, to 1:2 to apply the function to each value of the data frame. If you set MARGIN = c(2, 1) instead of c(1, 2) the output will be the same matrix but transposed. The output is of class “matrix” instead of “data.


2 Answers

We just need the data to be specified as . as data.frame is a list with columns as list elements. If we wrap list(.), it becomes a nested list

library(dplyr)
d %>% 
  mutate(u = pmap_int(., ~ which.max(c(...))))
#  a b c u
#1 1 4 2 2
#2 2 3 3 2
#3 3 2 4 3
#4 4 1 5 3

Or can use cur_data()

d %>%
   mutate(u = pmap_int(cur_data(), ~ which.max(c(...))))

Or if we want to use everything(), place that inside select as list(everything()) doesn't address the data from which everything should be selected

d %>% 
   mutate(u = pmap_int(select(., everything()), ~ which.max(c(...))))

Or using rowwise

d %>%
    rowwise %>% 
    mutate(u = which.max(cur_data())) %>%
    ungroup
# A tibble: 4 x 4
#      a     b     c     u
#  <int> <int> <int> <int>
#1     1     4     2     2
#2     2     3     3     2
#3     3     2     4     3
#4     4     1     5     3

Or this is more efficient with max.col

max.col(d, 'first')
#[1] 2 2 3 3

Or with collapse

library(collapse)
dapply(d, which.max, MARGIN = 1)
#[1] 2 2 3 3

which can be included in dplyr as

d %>% 
    mutate(u = max.col(cur_data(), 'first'))
like image 116
akrun Avatar answered Sep 18 '22 20:09

akrun


Here are some data.table options

setDT(d)[, u := which.max(unlist(.SD)), 1:nrow(d)]

or

setDT(d)[, u := max.col(.SD, "first")]
like image 23
ThomasIsCoding Avatar answered Sep 19 '22 20:09

ThomasIsCoding