Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Collapsing a data.frame to a data.frame -- problems with by() and aggregate()

Tags:

r

Consider that I have the following data and function returning summary statistics that I like

landlines <- data.frame(
                year=rep(c(1990,1995,2000,2005,2010),times=3),
                country=rep(c("US", "Brazil", "Asia"), each=5),
                pct =  c(0.99, 0.99, 0.98, 0.05, 0.9,
                         0.4,  0.5,  0.55, 0.5,  0.45,
                         0.7,  0.85, 0.9,  0.85, 0.75)
                )
someStats <- function(x)
{
  dp <- as.matrix(x$pct)-mean(x$pct)
  indp <- as.matrix(x$year)-mean(x$year)
  f <- lm.fit( indp,dp )$coefficients
  w <- sd(x$pct)
  m <- min(x$pct)
  results <- c(f,w,m)
  names(results) <- c("coef","sdev", "minPct")
  results
}

I can apply that function to a data subset successfully like this:

> someStats(landlines[landlines$country=="US",])
      coef      sdev    minPct 
 -0.022400  0.410938  0.050000 

or look at a breakdown by country like this:

> by(landlines, list(country=landlines$country), someStats)
country: Asia
      coef       sdev     minPct 
0.00200000 0.08215838 0.70000000 
--------------------------------------------------------------------------------------- 
country: Brazil
      coef       sdev     minPct 
0.00200000 0.05700877 0.40000000 
--------------------------------------------------------------------------------------- 
country: US
     coef      sdev    miPct 
-0.022400  0.410938  0.050000 

Trouble is, that is not the data.frame object I need for further processing, and it won't cast as such:

> as.data.frame( by(landlines, list(country=landlines$country), someStats) )
Error in as.data.frame.default(by(landlines, list(country = landlines$country),  : 
  cannot coerce class '"by"' into a data.frame

"No problem!" I think, since the similar aggregate() function does return a data.frame:

> aggregate(landlines$pct, by=list(country=landlines$country), min)
  country    x
1    Asia 0.70
2  Brazil 0.40
3      US 0.05

Trouble is, it doesn't work properly with arbitrary functions:

> aggregate(landlines, by=list(country=landlines$country), someStats)
Error in x$pct : $ operator is invalid for atomic vectors

What I really want to get is a data.frame object with the following columns:

  • country
  • coef
  • sdev
  • minPct

How can I do that?

like image 943
Brian B Avatar asked Apr 04 '12 14:04

Brian B


People also ask

What is the purpose of aggregate () in R?

aggregate() function is used to get the summary statistics of the data by group. The statistics include mean, min, sum.

What does aggregate mean in Rstudio?

Aggregate is a function in base R which can, as the name suggests, aggregate the inputted data. frame d.f by applying a function specified by the FUN parameter to each column of sub-data. frames defined by the by input parameter.

What is the difference between Cbind and data frame?

The data. frame() function works very similarly to cbind() – the only difference is that in data. frame() you specify names to each of the columns as you define them. Again, unlike matrices, dataframes can contain both string vectors and numeric vectors within the same object.


1 Answers

take a look at the plyr package and in particular ddply

> ddply(landlines, .(country), someStats)
  country    coef       sdev minPct
1    Asia  0.0020 0.08215838   0.70
2  Brazil  0.0020 0.05700877   0.40
3      US -0.0224 0.41093795   0.05

Ideally your function explicitly returns a data.frame but in this case, it can be coerced to one easily and correctly.

like image 130
Justin Avatar answered Sep 20 '22 22:09

Justin