Assigning names to the list output of dplyr do operation

Question

The do function in the package dplyr usually produces the list. Is there are way to assign names to that list depending on the input to do? Specifically I pass the group_by result and would like that the names of the list would give some indication to what group the list elements correspond.

Here is the toy example of what I want to achieve:

> it = data.frame(ind=c("a","a","b","b","c"),var1=c(1,2,3,4,5), var1=c(2,3,4,2,2))
> group_by(it,ind)%.%summarise(min(var1))
Source: local data frame [3 x 2]

  ind min(var1)
1   c         5
2   b         3
3   a         1

Now do this with do

> do(group_by(it,ind),function(x)min(x[,"var1"]))
[[1]]
[1] 5

[[2]]
[1] 3

[[3]]
[1] 1

Ideally the names should be c("c","b","a").

Is this possible? And why dplyr reverses sorting of the groups? Note in my case the result of the do operation is a lm object.

Edit: The comment asks for realistic example, here is what I had in mind. I fit models depending on the data (dummy code):

res <- do(group_by(data,Index),lm,formula=y~x)

Now I want to do various things like

sapply(res,coef)

So I want to relate the results to the original dataset, in this case to what Index the coefficients correspond.

Edit 2: The desired behaviour can be achieved with dlply function:

dlply(it,~ind,function(d)min(d[,"var1"]))

$a
[1] 1

$b
[1] 3

$c
[1] 5

attr(,"split_type")
[1] "data.frame"
attr(,"split_labels")
  ind
1   a
2   b
3   c

I am looking whether it is possible to replicate this behaviour with dplyr, preferably with minimal intervention.

G. Grothendieck · Accepted Answer

Try this marked up version of do.grouped_df:

do2 <- function (.data, .f, ...) {
    if (is.null(attr(.data, "indices"))) {
        .data <- dplyr:::grouped_df_impl(.data, attr(.data, "vars"), 
            attr(.data, "drop"))
    }
    index <- attr(.data, "indices")
    out <- vector("list", length(index))
    for (i in seq_along(index)) {
        subs <- .data[index[[i]] + 1L, , drop = FALSE]
        out[[i]] <- .f(subs, ...)
    }
    nms <- as.character(attr(.data, "labels")[[1]])
    setNames(out, nms)
}

library(gusbfn)

it %.% group_by(ind) %.% do2(function(x) min(x$var1))

which gives:

$a
[1] 1

$b
[1] 3

$c
[1] 5

It could also be combined with fn$ from the gsubfn package like this to shorten it slightly:

library(dplyr)
library(gsubfn)

it %.% group_by(ind) %.% fn$do2(~ min(x$var1))

giving the same answer.

Assigning names to the list output of dplyr do operation

Tags:

r

dplyr

mpiktas

1 Answers

G. Grothendieck

Recent Activity

Donate For Us

Assigning names to the list output of dplyr do operation

Tags:

r

dplyr

mpiktas

1 Answers

G. Grothendieck

Related questions

Recent Activity

Donate For Us