Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to run tapply() on multiple columns of data frame using R?

Tags:

I have a data frame like the following:

a   b1  b2  b3  b4  b5  b6  b7  b8  b9
D   4   6   9   5   3   9   7   9   8
F   7   3   8   1   3   1   4   4   3
R   2   5   5   1   4   2   3   1   6
D   9   2   1   4   3   3   8   2   5
D   5   4   3   1   6   4   1   8   3
R   3   7   9   1   8   5   3   4   2
D   4   1   8   2   6   3   2   7   5
F   7   1   7   2   7   1   6   2   4
D   6   3   9   3   9   9   7   1   2

The function tapply(df[,2], INDEX = df$a, sum) works fine to produce a table that sums everything in df[,2] by df$a, but when I try tapply(df[,2:10], INDEX = df$a, sum) to get a similar table, except with a sum for each column (2, 3, 4,..., 10), I get an error message reading:

Error in tapply(df[, 2:10], INDEX = df$a, sum) : arguments must have same length

Additionally, I would like the row names of the table to be the column names of df[,2:10], such that row 1 is b1, row 2 is b2, and row 9 is b9.

like image 482
Jota Avatar asked Aug 11 '11 16:08

Jota


People also ask

How do you use Tapply in R with multiple factors?

Tapply in R with multiple factors You can apply the tapply function to multiple columns (or factor variables) passing them through the list function. In this example, we are going to apply the tapply function to the type and store factors to calculate the mean price of the objects by type and store. tapply(price, list(type, store), mean)

How to convert a Tapply call to a Dataframe?

There are a lot of different ways to transform the output from a tapply call into a data.frame. But it's much simpler to avoid the call to tapply in the first place and substitute that with a call to a similar function that returns a data frame instead of a vector: so just change your function call from tapply to aggregate, like so:

How do you apply Tapply to multiple columns?

You can apply the tapply function to multiple columns (or factor variables) passing them through the list function. In this example, we are going to apply the tapply function to the type and store factors to calculate the mean price of the objects by type and store.

How do I plot multiple columns from a data frame in R?

Often you may want to plot multiple columns from a data frame in R. Fortunately this is easy to do using the visualization library ggplot2. This tutorial shows how to use ggplot2 to plot multiple columns of a data frame on the same graph and on different graphs.


1 Answers

That's because tapply works on vectors, and transforms df[,2:10] to a vector. Next to that, sum will give you the total sum, not the sum per column. Use aggregate(), eg :

aggregate(df[,2:10],by=list(df$a), sum)

If you want a list returned, you could use by() for that. Make sure to specify colSums instead of sum, as by works on a splitted dataframe :

by(df[,2:10],df$a,FUN=colSums)
like image 107
Joris Meys Avatar answered Sep 23 '22 02:09

Joris Meys