Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing function around statistical tests in R

Tags:

function

r

I'm writing a function for my (working) R script in order to clean up my code. I do not have experience with writing functions, but decided I should invest some time into this. The goal of my function is to perform multiple statistical tests while only passing the required dataframe, quantitative variable and grouping variable once. However, I cannot get this to work. For your reference, I'll use the ToothGrowth data frame to illustrate my problem.

Say I want to run a Kruskal-Wallis test and one-way ANOVA on len, to compare different groups named supp, for whatever reason. I can do this separately with

kruskal.test(len ~ supp, data = ToothGrowth)
aov(len ~ supp, data = ToothGrowth)

Now I want to write a function that performs both tests. This is what I had thought should work:

stat_test <- function(mydata, quantvar, groupvar) {
  kruskal.test(quantvar ~ groupvar, data = mydata)
  aov(quantvar ~ groupvar, data = mydata)
}

But if I then run stat_test(ToothGrowth, "len", "sup"), I get the error

Error in kruskal.test.default("len", "supp") : 
  all observations are in the same group 

What am I doing wrong? Any help would be much appreciated!

like image 911
Eydise Avatar asked Nov 29 '25 00:11

Eydise


1 Answers

You can use deparse(substitute(quantvar)) to get the quoted name of the column you are passing to the function, and this will allow you to build a formula using paste. This is a more idiomatic way of operating in R.

Here's a reproducible example:

stat_test <- function(mydata, quantvar, groupvar) {
  A <- as.formula(paste(deparse(substitute(quantvar)), "~", 
                        deparse(substitute(groupvar))))
  print(kruskal.test(A, data = mydata))
  cat("\n--------------------------------------\n\n")
  aov(A, data = mydata)
}

stat_test(ToothGrowth, len, supp)
#> 
#>  Kruskal-Wallis rank sum test
#> 
#> data:  len by supp
#> Kruskal-Wallis chi-squared = 3.4454, df = 1, p-value = 0.06343
#> 
#> 
#> --------------------------------------
#> Call:
#>    aov(formula = A, data = mydata)
#> 
#> Terms:
#>                     supp Residuals
#> Sum of Squares   205.350  3246.859
#> Deg. of Freedom        1        58
#> 
#> Residual standard error: 7.482001
#> Estimated effects may be unbalanced

Created on 2020-03-30 by the reprex package (v0.3.0)

like image 174
Allan Cameron Avatar answered Nov 30 '25 16:11

Allan Cameron



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!