Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr: access current group variable

After using data.table for quite some time I now thought it's time to try dplyr. It's fun, but I wasn't able to figure out how to access - the current grouping variable - returning multiple values per group

The following example shows is working fine with data.table. How would you write this with dplyr

foo <- matrix(c(1, 2, 3, 4), ncol = 2)
dt <- data.table(a = c(1, 1, 2), b = c(4, 5, 6))

# data.table (expected)
dt[, .(c = foo[, a]), by = a]
   a c
1: 1 1
2: 1 2
3: 2 3
4: 2 4

# dplyr (?)
dt %>% 
  group_by(a) %>% 
  summarize(c = foo[a])
like image 525
Fabian Gehring Avatar asked Jul 29 '16 16:07

Fabian Gehring


People also ask

What is .groups argument in R?

groups` argument.” is that the dplyr package drops the last group variable that was specified in the group_by function, in case we are using multiple columns to group our data before applying the summarise function. This message helps to make the user aware that a grouping was performed.

What does ungroup () do in R?

Running ungroup() will drop any grouping. This can be reinstated again with regroup().

How do I group categorical variables in R?

When working with categorical variables, you may use the group_by() method to divide the data into subgroups based on the variable's distinct categories. You can group by a single variable or by giving in multiple variable names to group by several variables.

What is mutate function in R?

In R programming, the mutate function is used to create a new variable from a data set. In order to use the function, we need to install the dplyr package, which is an add-on to R that includes a host of cool functions for selecting, filtering, grouping, and arranging data.


2 Answers

You can still access the group variable but it is like a normal vector with one unique value for each group, so if you put unique around it, it will work. And at same time, dplyr does not seem to expand rows like data.table automatically, you will need the unnest from tidyr package:

library(dplyr); library(tidyr)
dt %>% 
      group_by(a) %>% 
      summarize(c = list(foo[,unique(a)])) %>% 
      unnest()

# Source: local data frame [4 x 2]

#       a     c
#   <dbl> <dbl>
# 1     1     1
# 2     1     2
# 3     2     3
# 4     2     4

Or we can use first to speed up, since we've already know the group variable vector is the same for every group:

dt %>% 
      group_by(a) %>% 
      summarize(c = list(foo[,first(a)])) %>% 
      unnest()

# Source: local data frame [4 x 2]

#       a     c
#   <dbl> <dbl>
# 1     1     1
# 2     1     2
# 3     2     3
# 4     2     4
like image 55
Psidom Avatar answered Oct 06 '22 01:10

Psidom


We can use do from dplyr. (No other packages used). The do is very handy for expanding rows. We only need to wrap with data.frame.

dt %>% 
     group_by(a) %>%
     do(data.frame(c = foo[, unique(.$a)]))
#      a     c
#  <dbl> <dbl>
#1     1     1
#2     1     2
#3     2     3
#4     2     4

Or instead of unique we can subset by the 1st observation

dt %>% 
    group_by(a) %>%
    do(data.frame(c = foo[, .$a[1]]))
#     a     c
#  <dbl> <dbl>
#1     1     1
#2     1     2
#3     2     3
#4     2     4

This can be also done without using any packages

stack(lapply(split(dt$a, dt$a), function(x) foo[,unique(x)]))[2:1]
#   ind values
#1   1      1
#2   1      2
#3   2      3
#4   2      4
like image 34
akrun Avatar answered Oct 05 '22 23:10

akrun