Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Dplyr within a user-defined function to summarise data then plot it

I am trying to use dplyr within a function to create a user-defined function that I can pass multiple arguments to summarise data with dplyr then plot it with ggplot.

Here is some sample data and what I am trying to do with dplyr then plot

df <-data.frame(Year = c("2006", "2006", "2006", "2007", "2007", "2007", "2008", "2009", "2010", "2010", "2009", "2009"), JudicialOrientation = c("Defense", "Plaintiff", "Plaintiff", "Neutral", "Defense", "Plaintiff", "Defense", "Plaintiff", "Neutral", "Neutral", "Plaintiff","Defense"), Loss = c(100000, 100, 2500, 100000, 25000, 0, 7500, 5200, 900, 100, 0, 50))

df1 <- df %>%
  group_by(Year, JudicialOrientation) %>%
  summarise(MeanLoss =mean(Loss))

ggplot(df1, aes(x = JudicialOrientation, y = MeanLoss, color = Year, group  =Year)) + 
  geom_line() +
  geom_point()

I am now trying to replicate this into a user function so that I can pass different variables to get similar results.

Here is my attempt so far:

ConsistencyPlot <- function(df,var1,timevar,lossvar){

  df1 <- df %>%
    group_by_(df[timevar], df[var1]) %>%
    summarise_(MeanLoss = mean(df[lossvar]))

  ggplot(df1, aes(x = var1, y = MeanLoss, color = timevar, group = timevar)) +
    geom_line() +
    geom_point()

}

ConsistencyPlot(df,"JudicialOrientation","Year",'Loss')

I am replicating the same logic and passing in df as my dataframe, var1 as the JudicialOrientation, timevar as Year and lossvar as my vector of Loss values that I want averaged through summarise. I cannot get the same results however so I feel like I am missing something with how these functions are used within a closure.

like image 792
Coldchain9 Avatar asked Dec 22 '25 21:12

Coldchain9


1 Answers

First of all, inside dplyr functions you don't need to call variables indexing the dataframe like df[, timevar]. Use just the variable name. Besides that, when indexing a dataframe you have to specify if you are calling columns or rows, so df[timevar] is wrong.

About the function, it's a problem of evaluation.

This structure below is working:

ConsistencyPlot <- function(df, var1, timevar, lossvar){
  var1 <- enquo(var1)
  timevar <- enquo(timevar)
  lossvar <- enquo(lossvar)

  df1 <- df %>%
    group_by(!!timevar, !!var1) %>%
    summarise(MeanLoss = mean(!!lossvar))

  ggplot(df1, aes(x = !!var1, y = MeanLoss, color = !!timevar, group = !!timevar)) +
    geom_line() +
    geom_point()
}

Look that the parameters were transformed with enquo() and then passed in the function using !!. So, you can pass the arguments without quoting them.

ConsistencyPlot(df, JudicialOrientation, Year, Loss)

I hope you find it useful.

like image 189
Bruno Pinheiro Avatar answered Dec 24 '25 09:12

Bruno Pinheiro



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!