Flatten/denormalize the result of the R aggregate function

Tags:

I'm fairly new to R and I'm trying to use aggregate to perform some time series shaping on a dataframe, per subject and for each metric in my dataset. This works beautifully, but I find that the result isn't in a format that's very easy to use. I'd like to be able to transform the results back into the same format as the original dataframe.

Using the iris dataset as an example:

# Split into two data frames, one for metrics, the other for grouping
iris_species = subset(iris, select=Species)
iris_metrics = subset(iris, select=-Species)
# Compute diff for each metric with respect to its species
iris_diff = aggregate(iris_metrics, iris_species, diff)

I'm just using diff to illustrate that I have a function that shapes the time series, so I get a time series of possibly different length as a result and definitely not a single aggregate value (e.g. mean).

I'd like to transform the result, which seems to be a matrix that has list valued cells to the original "flat" dataframe.

I'm mostly curious about how to manage this with results from aggregate, but I'd be ok with solutions that do everything in plyr or reshape.

704

asked Mar 01 '13 22:03

Vince Gatto

2 Answers

As you might know, aggregate works on one column at a time. A single value is expected, and odd things happen if you return vectors of length different from 1.

You can split this up with by to get the data (with fewer rows than in iris) and put it back together:

b <- by(iris_metrics, iris_species, FUN=function(x) diff(as.matrix(x)))
do.call(rbind, lapply(names(b), function(x) data.frame(Species=x, b[[x]])))

diff(as.matrix) is used as this does what you want for matrices (but not for data frames). The key point is that the function returns a different number of rows than are in each Species in iris.

165

answered Oct 07 '22 05:10

Matthew Lundberg

The best solution I could think of in this case is data.table:

require(data.table)
dt <- data.table(iris, key="Species")
dt.out <- dt[, lapply(.SD, diff), by=Species]

And if you want a plyr solution, then the idea is basically the same. Split by Species and apply diff to each column.

require(plyr)
ddply(iris, .(Species), function(x) do.call(cbind, lapply(x[,1:4], diff)))

answered Oct 07 '22 04:10

Arun

Related questions
                            
                                How to handle blank items when converting dates in R
                            
                                R or MATLAB: permute a large sparse matrix into a block diagonal matrix
                            
                                Return system.time by default
                            
                                Frequency distribution with custom format data
                            
                                Read Large File line by line in R without header
                            
                                gtable structure element description
                            
                                error in ddply function sum?
                            
                                Access a build system from another build system in Sublime Text 2
                            
                                Checking duplicates, sum them and delete one row after summing
                            
                                Eigenvalues calculations in C-within-R codes
                            
                                How do I add larger borders to shapes that have an already assigned size aesthetic in ggplot2?
                            
                                How do I stop ggplot2 from rotating my matrix 90 degrees?
                            
                                reading raw data in R to be saved as .RData file using the dropbox api
                            
                                Shiny reactiveUI hangs with multiple uiOutput calls on same condition variable
                            
                                `setattr` on `levels` preserving unwanted duplicates (R data.table)
                            
                                R can't find some packages when running via crontab
                            
                                Function always returns numeric(0) [closed]
                            
                                Defer expression evaluation without using `quote`
                            
                                Changing ggplot2 legend title without altering graphical parameters
                            
                                Python equivalent of R's head and tail function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Flatten/denormalize the result of the R aggregate function

Tags:

r

aggregate

reshape

plyr

Vince Gatto

People also ask

2 Answers

Matthew Lundberg

Arun

Recent Activity

Donate For Us