I'm fairly new to R and I'm trying to use aggregate
to perform some time series shaping on a dataframe, per subject and for each metric in my dataset. This works beautifully, but I find that the result isn't in a format that's very easy to use. I'd like to be able to transform the results back into the same format as the original dataframe.
Using the iris dataset as an example:
# Split into two data frames, one for metrics, the other for grouping
iris_species = subset(iris, select=Species)
iris_metrics = subset(iris, select=-Species)
# Compute diff for each metric with respect to its species
iris_diff = aggregate(iris_metrics, iris_species, diff)
I'm just using diff
to illustrate that I have a function that shapes the time series, so I get a time series of possibly different length as a result and definitely not a single aggregate value (e.g. mean).
I'd like to transform the result, which seems to be a matrix that has list valued cells to the original "flat" dataframe.
I'm mostly curious about how to manage this with results from aggregate
, but I'd be ok with solutions that do everything in plyr
or reshape
.
Aggregate() Function in R Splits the data into subsets, computes summary statistics for each subsets and returns the result in a group by form. Aggregate function in R is similar to group by in SQL. Aggregate() function is useful in performing all the aggregate operations like sum,count,mean, minimum and Maximum.
In R, you can use the scale() function to scale the values in a vector, matrix, or data frame. You will almost always receive meaningless results if you do not normalize the vectors or columns you are utilizing. Scale() is a built-in R function that centers and/or scales the columns of a numeric matrix by default.
As you might know, aggregate
works on one column at a time. A single value is expected, and odd things happen if you return vectors of length different from 1.
You can split this up with by
to get the data (with fewer rows than in iris
) and put it back together:
b <- by(iris_metrics, iris_species, FUN=function(x) diff(as.matrix(x)))
do.call(rbind, lapply(names(b), function(x) data.frame(Species=x, b[[x]])))
diff(as.matrix)
is used as this does what you want for matrices (but not for data frames). The key point is that the function returns a different number of rows than are in each Species
in iris
.
The best solution I could think of in this case is data.table
:
require(data.table)
dt <- data.table(iris, key="Species")
dt.out <- dt[, lapply(.SD, diff), by=Species]
And if you want a plyr
solution, then the idea is basically the same. Split by Species
and apply diff
to each column.
require(plyr)
ddply(iris, .(Species), function(x) do.call(cbind, lapply(x[,1:4], diff)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With