Flexible functions R

Question

I have written some code to create my own descriptive statistics table since the default summary doesn't do what I want.

Now what I would like is to create a flexible / dynamic function that does this with varying number of variables.

My code looks like this:

N <- c( length(data1), length(data2), length(data3) ) 
mean<- c( mean(data1), mean(data2), mean(data3) )
sd <- c( sd(data1), sd(data2), sd(data3) )
min <- c( min(data1), min(data2), min(data3) )
max <- c( max(data1), max(data2), max(data3) )
print(q) <- data.frame(N, mean, sd, min, max)

So instead of editing this if i want descriptive of something else than 3 variables I would like a function that did something like this;

descriptive <- function(data1, ...) {
  N <- c( length(data1), length(...) ) 
  mean<- c( mean(data1), mean(...) )
  sd <- c( sd(data1), sd(...) )
  min <- c( min(data1), min(...) )
  max <- c( max(data1), max(...) )
  q <- data.frame(N, mean, sd, min, max)
  print(q)
}

I tried the above and hoped it would work, but it only works with two variables. As you might see, I am new to R. I have tried to search for a solution, but I've not been able to find one. But if R is as good as "they" say, I think something like this should be possible.

There's probably a function that already does this, but I would like to be able to do it my self. (: Hope someone can help me!

EDIT!!

Thank you all for your answers, they all seem to work. This shows there are multiple answers to the same question in R. I don't know if you get points for the accepted answer and if this is important, but I choose Arun answers since it comes closed to my aim of creating a descriptive table that is "good looking" and flexible.

If anyone in the future is interested I've add this to Arun answer that makes it fit my purpose perfect;

data <- list(var1, var2 ...)
names <- c"name1", "name2", "...")
descriptive(data)

This solution also seems to have the benefit of variables of different lengths vs data frames.

Thomas · Accepted Answer

This would be a good opportunity to learn the apply family of functions, so that you can specify your intended output as a function and then apply that to a dataframe.

mydf <- data.frame(x=rnorm(100), y=rnorm(100)) # example data

descriptive <- function(x)
   c(length=length(x), mean=mean(x), sd=sd(x), min=min(x), max=max(x))

sapply(mydf, descriptive) # apply `descriptive` to the df

The output:

                   x             y             z
length  1.000000e+03 1000.00000000 1000.00000000
mean    3.846765e-03   -0.02009427    0.02001385
sd      9.818488e-01    0.97662850    1.01543571
min    -2.905149e+00   -3.25904432   -3.33017918
max     3.235993e+00    2.86892044    3.13183601

One caution with this is that unless you develop a more sophisticated descriptive function, it won't be able to handle NA values in your data and will cause you problems for variables of different classes in the dataframe (e.g., the mean of a character vector is NA).

This is also more efficient than building a function that internally applies to a list of vectors (as Arun suggests) and plyr (from Baptiste: ldply(mydf, each(length, mean, sd, min, max))):

mydf <- data.frame(x=rnorm(1e5),y=rnorm(1e5),z=rnorm(1e5))
microbenchmark(sapply(mydf,thomas), arun(mydf), baptiste(mydf))

Unit: milliseconds
                 expr       min        lq    median        uq      max neval
 sapply(mydf, thomas)  5.693252  6.039458  7.139658  7.953309 43.32675   100
           arun(mydf) 15.805778 18.522889 19.417559 22.016125 57.93630   100
       baptiste(mydf) 10.995073 11.597998 12.666252 13.861521 47.85533   100

Flexible functions R

Tags:

r

statistics

user-defined-functions

user2624239

1 Answers

Thomas

Recent Activity

Donate For Us

Flexible functions R

Tags:

r

statistics

user-defined-functions

user2624239

1 Answers

Thomas

Related questions

Recent Activity

Donate For Us