Consider that I have the following data and function returning summary statistics that I like
landlines <- data.frame(
year=rep(c(1990,1995,2000,2005,2010),times=3),
country=rep(c("US", "Brazil", "Asia"), each=5),
pct = c(0.99, 0.99, 0.98, 0.05, 0.9,
0.4, 0.5, 0.55, 0.5, 0.45,
0.7, 0.85, 0.9, 0.85, 0.75)
)
someStats <- function(x)
{
dp <- as.matrix(x$pct)-mean(x$pct)
indp <- as.matrix(x$year)-mean(x$year)
f <- lm.fit( indp,dp )$coefficients
w <- sd(x$pct)
m <- min(x$pct)
results <- c(f,w,m)
names(results) <- c("coef","sdev", "minPct")
results
}
I can apply that function to a data subset successfully like this:
> someStats(landlines[landlines$country=="US",])
coef sdev minPct
-0.022400 0.410938 0.050000
or look at a breakdown by country like this:
> by(landlines, list(country=landlines$country), someStats)
country: Asia
coef sdev minPct
0.00200000 0.08215838 0.70000000
---------------------------------------------------------------------------------------
country: Brazil
coef sdev minPct
0.00200000 0.05700877 0.40000000
---------------------------------------------------------------------------------------
country: US
coef sdev miPct
-0.022400 0.410938 0.050000
Trouble is, that is not the data.frame
object I need for further processing, and it won't cast as such:
> as.data.frame( by(landlines, list(country=landlines$country), someStats) )
Error in as.data.frame.default(by(landlines, list(country = landlines$country), :
cannot coerce class '"by"' into a data.frame
"No problem!" I think, since the similar aggregate()
function does return a data.frame
:
> aggregate(landlines$pct, by=list(country=landlines$country), min)
country x
1 Asia 0.70
2 Brazil 0.40
3 US 0.05
Trouble is, it doesn't work properly with arbitrary functions:
> aggregate(landlines, by=list(country=landlines$country), someStats)
Error in x$pct : $ operator is invalid for atomic vectors
What I really want to get is a data.frame
object with the following columns:
How can I do that?
aggregate() function is used to get the summary statistics of the data by group. The statistics include mean, min, sum.
Aggregate is a function in base R which can, as the name suggests, aggregate the inputted data. frame d.f by applying a function specified by the FUN parameter to each column of sub-data. frames defined by the by input parameter.
The data. frame() function works very similarly to cbind() – the only difference is that in data. frame() you specify names to each of the columns as you define them. Again, unlike matrices, dataframes can contain both string vectors and numeric vectors within the same object.
take a look at the plyr
package and in particular ddply
> ddply(landlines, .(country), someStats)
country coef sdev minPct
1 Asia 0.0020 0.08215838 0.70
2 Brazil 0.0020 0.05700877 0.40
3 US -0.0224 0.41093795 0.05
Ideally your function explicitly returns a data.frame
but in this case, it can be coerced to one easily and correctly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With