Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to apply a function to the entire table in a dplyr chain

Tags:

r

dplyr

I have a dplyr chain as follows

myResults <- rawData %>% filter(stuff) %>% mutate(stuff)

now I want to apply a function myFunc to myResults. Is there a way to do that in the chain or do I need to basically do:

myResults <- myFunc(myResult)
like image 472
user1357015 Avatar asked Jun 20 '15 00:06

user1357015


2 Answers

If function takes a dataframe as a first argument, you can simple add it at the end.

> myFunc <- function(x) sapply(x, max)
> mtcars  %>% filter(mpg > 20) %>%  myFunc()
    mpg     cyl    disp      hp    drat      wt    qsec      vs      am    gear 
 33.900   6.000 258.000 113.000   4.930   3.215  22.900   1.000   1.000   5.000 
   carb 
  4.000 

It is worth mention that magrittr::%>% which is used by dplyr works with any argument so you can easily do something like this:

> inc <- function(x) x + 1
> 1 %>% inc(.) %>% sqrt(.) %>% log(.)
[1] 0.3465736

and with some useful magrittr aliases:

library(magrittr)
set.seed(1)
inTrain <- sample(1:nrow(mtcars), 20)
mtcarsTest <- mtcars %>% extract(-inTrain, )

summaryPipe <- function(x) {print(summary(x)); x}

mtcars %>%
    extract(inTrain, ) %>% 
    # Train lm
    lm(mpg ~ ., .) %>%
    # Print summary and forward lm results
    summaryPipe %>%
    # Predict on the test set
    predict(newdata = mtcarsTest) %>%
    # Print results and forward arguments
    print %>%
    # Compute RMSE
    subtract(mtcarsTest %>% extract2('mpg')) %>%
    raise_to_power(2) %>%
    mean %>%
    sqrt

It is probably a matter of taste but personally I find it rather useful.

As @BondedDust mentioned in the comments there are three possible ways of passing a function to %>%. With dot placeholder you can use LHS on a different position than the first (see lm call).

like image 78
zero323 Avatar answered Jun 17 '23 02:06

zero323


You can use the existing functions summarise_each or mutate_each to apply to all columns or select a subset of columns

   library(dplyr)
   mtcars %>% 
     filter(mpg > 20) %>%
     summarise_each(funs(max))
   #   mpg cyl disp  hp drat    wt qsec vs am gear carb
   #1 33.9   6  258 113 4.93 3.215 22.9  1  1    5    4

Or passing an external function

  myFunc1 <- function(x) max(x)
  mtcars %>% 
     filter(mpg > 20) %>%
     summarise_each(funs(myFunc1))
 #   mpg cyl disp  hp drat    wt qsec vs am gear carb
 #1 33.9   6  258 113 4.93 3.215 22.9  1  1    5    4
like image 40
akrun Avatar answered Jun 17 '23 01:06

akrun