I have written some code to create my own descriptive statistics table since the default summary doesn't do what I want.
Now what I would like is to create a flexible / dynamic function that does this with varying number of variables.
My code looks like this:
N <- c( length(data1), length(data2), length(data3) )
mean<- c( mean(data1), mean(data2), mean(data3) )
sd <- c( sd(data1), sd(data2), sd(data3) )
min <- c( min(data1), min(data2), min(data3) )
max <- c( max(data1), max(data2), max(data3) )
print(q) <- data.frame(N, mean, sd, min, max)
So instead of editing this if i want descriptive of something else than 3 variables I would like a function that did something like this;
descriptive <- function(data1, ...) {
N <- c( length(data1), length(...) )
mean<- c( mean(data1), mean(...) )
sd <- c( sd(data1), sd(...) )
min <- c( min(data1), min(...) )
max <- c( max(data1), max(...) )
q <- data.frame(N, mean, sd, min, max)
print(q)
}
I tried the above and hoped it would work, but it only works with two variables. As you might see, I am new to R. I have tried to search for a solution, but I've not been able to find one. But if R is as good as "they" say, I think something like this should be possible.
There's probably a function that already does this, but I would like to be able to do it my self. (: Hope someone can help me!
EDIT!!
Thank you all for your answers, they all seem to work. This shows there are multiple answers to the same question in R. I don't know if you get points for the accepted answer and if this is important, but I choose Arun answers since it comes closed to my aim of creating a descriptive table that is "good looking" and flexible.
If anyone in the future is interested I've add this to Arun answer that makes it fit my purpose perfect;
data <- list(var1, var2 ...)
names <- c"name1", "name2", "...")
descriptive(data)
This solution also seems to have the benefit of variables of different lengths vs data frames.
This would be a good opportunity to learn the apply family of functions, so that you can specify your intended output as a function and then apply that to a dataframe.
mydf <- data.frame(x=rnorm(100), y=rnorm(100)) # example data
descriptive <- function(x)
c(length=length(x), mean=mean(x), sd=sd(x), min=min(x), max=max(x))
sapply(mydf, descriptive) # apply `descriptive` to the df
The output:
x y z
length 1.000000e+03 1000.00000000 1000.00000000
mean 3.846765e-03 -0.02009427 0.02001385
sd 9.818488e-01 0.97662850 1.01543571
min -2.905149e+00 -3.25904432 -3.33017918
max 3.235993e+00 2.86892044 3.13183601
One caution with this is that unless you develop a more sophisticated descriptive function, it won't be able to handle NA values in your data and will cause you problems for variables of different classes in the dataframe (e.g., the mean of a character vector is NA).
This is also more efficient than building a function that internally applies to a list of vectors (as Arun suggests) and plyr (from Baptiste: ldply(mydf, each(length, mean, sd, min, max))):
mydf <- data.frame(x=rnorm(1e5),y=rnorm(1e5),z=rnorm(1e5))
microbenchmark(sapply(mydf,thomas), arun(mydf), baptiste(mydf))
Unit: milliseconds
expr min lq median uq max neval
sapply(mydf, thomas) 5.693252 6.039458 7.139658 7.953309 43.32675 100
arun(mydf) 15.805778 18.522889 19.417559 22.016125 57.93630 100
baptiste(mydf) 10.995073 11.597998 12.666252 13.861521 47.85533 100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With