I have a data frame like the following:
a b1 b2 b3 b4 b5 b6 b7 b8 b9
D 4 6 9 5 3 9 7 9 8
F 7 3 8 1 3 1 4 4 3
R 2 5 5 1 4 2 3 1 6
D 9 2 1 4 3 3 8 2 5
D 5 4 3 1 6 4 1 8 3
R 3 7 9 1 8 5 3 4 2
D 4 1 8 2 6 3 2 7 5
F 7 1 7 2 7 1 6 2 4
D 6 3 9 3 9 9 7 1 2
The function tapply(df[,2], INDEX = df$a, sum)
works fine to produce a table that sums everything in df[,2] by df$a, but when I try tapply(df[,2:10], INDEX = df$a, sum)
to get a similar table, except with a sum for each column (2, 3, 4,..., 10), I get an error message reading:
Error in tapply(df[, 2:10], INDEX = df$a, sum) : arguments must have same length
Additionally, I would like the row names of the table to be the column names of df[,2:10]
, such that row 1 is b1, row 2 is b2, and row 9 is b9.
Tapply in R with multiple factors You can apply the tapply function to multiple columns (or factor variables) passing them through the list function. In this example, we are going to apply the tapply function to the type and store factors to calculate the mean price of the objects by type and store. tapply(price, list(type, store), mean)
There are a lot of different ways to transform the output from a tapply call into a data.frame. But it's much simpler to avoid the call to tapply in the first place and substitute that with a call to a similar function that returns a data frame instead of a vector: so just change your function call from tapply to aggregate, like so:
You can apply the tapply function to multiple columns (or factor variables) passing them through the list function. In this example, we are going to apply the tapply function to the type and store factors to calculate the mean price of the objects by type and store.
Often you may want to plot multiple columns from a data frame in R. Fortunately this is easy to do using the visualization library ggplot2. This tutorial shows how to use ggplot2 to plot multiple columns of a data frame on the same graph and on different graphs.
That's because tapply works on vectors, and transforms df[,2:10] to a vector. Next to that, sum will give you the total sum, not the sum per column. Use aggregate()
, eg :
aggregate(df[,2:10],by=list(df$a), sum)
If you want a list returned, you could use by() for that. Make sure to specify colSums instead of sum, as by works on a splitted dataframe :
by(df[,2:10],df$a,FUN=colSums)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With