Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I make the output from tapply() into a data.frame

Tags:

I have a data.frame in R that looks like this:

      score    rms  template   aln_id       description 1  -261.410  4.951 2f22A.pdb  2F22A_1 S_00001_0000002_0 2  -231.987 21.813 1wb9A.pdb  1WB9A_4 S_00002_0000002_0 3  -263.722  4.903 2f22A.pdb  2F22A_3 S_00003_0000002_0 4  -269.681 17.732 1wbbA.pdb  1WBBA_6 S_00004_0000002_0 5  -258.621 19.098 1rxqA.pdb  1RXQA_3 S_00005_0000002_0 6  -246.805  6.889 1rxqA.pdb 1RXQA_15 S_00006_0000002_0 7  -281.300 16.262 1wbdA.pdb 1WBDA_11 S_00007_0000002_0 8  -271.666  4.193 2f22A.pdb  2F22A_2 S_00008_0000002_0 9  -277.964 13.066 1wb9A.pdb  1WB9A_5 S_00009_0000002_0 10 -261.024 17.153 1yy9A.pdb  1YY9A_2 S_00001_0000003_0 

I can calculate summary statistics on the data.frame like this:

> tapply( d$score, d$template, mean ) 1rxqA.pdb 1wb9A.pdb 1wbbA.pdb 1wbdA.pdb 1yy9A.pdb 2f22A.pdb  -252.7130 -254.9755 -269.6810 -281.3000 -261.0240 -265.5993  

Is there an easy way that I coerce this output back into a data.frame? I'd like for it to have these two columns:

d$template mean 

I love tapply, but right now I'm cutting and pasting the results from tapply into a text file and hacking it up a bit to get the summary statistics that I want with appropriate names. This feels very wrong, and I'd like to do something better!

like image 327
James Thompson Avatar asked Apr 11 '10 18:04

James Thompson


People also ask

What is the output of tapply?

By default, if the applied function returns a scalar, then tapply returns a vector. In this case we are applying the mean function, so the output of tapply is a numeric vector. tapply(x, f, mean) # Take the mean of each group.


1 Answers

There are a lot of different ways to transform the output from a tapply call into a data.frame.

But it's much simpler to avoid the call to tapply in the first place and substitute that with a call to a similar function that returns a data frame instead of a vector:

more specifically:

  • tapply returns a vector

  • aggregate returns a data frame

so just change your function call from tapply to aggregate, like so:

data(iris)     # in 'datasets' just call 'data' and pass in 'iris' as an argument  tx = tapply(iris$Sepal.Length, list(iris$Species), mean) # returns: versicolor  virginica               5.94       6.59   class(tx) # returns: vector  tx = aggregate(iris$Sepal.length, list(iris$Species), mean) # returns:          Group.1    x      1 versicolor 5.94      2  virginica 6.59   class(tx) # returns: data.frame 
like image 69
doug Avatar answered Oct 06 '22 15:10

doug