Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In dplyr 0.5.0, on a grouped data frame, why does slice(1) not give the same row ordering as filter(row_number() == 1)?

Tags:

r

dplyr

I am observing that slice changes the ordering of the rows in some circumstances when group_by is used.

tmp_df2 <- data.frame(a = c(1, 3, 2, 4), b = c(1, 2, 3, 4))

tmp_df2 %>%
    group_by(a) %>%
    slice(1)

gives

Source: local data frame [4 x 2]
Groups: a [4]

      a     b
  <dbl> <dbl>
1     1     1
2     2     3
3     3     2
4     4     4

and

tmp_df2 %>%
    group_by(a) %>%
    filter(row_number() == 1)

gives

Source: local data frame [4 x 2]
Groups: a [4]

      a     b
  <dbl> <dbl>
1     1     1
2     3     2
3     2     3
4     4     4

It looks like slice reeorders the output in ascending order of the grouping variables. However, the documentation suggests that slice and filter should behave in the same way, particularly from ?slice (emphasis mine):

Slice does not work with relational databases because they have no intrinsic notion of row order. If you want to perform the equivalent operation, use filter() and row_number().

like image 271
Alex Avatar asked Oct 22 '16 01:10

Alex


1 Answers

Looking at the code, slice() works by iterating over the groups, and so its output will be in group ordered form. I suspect it is more efficient than the equivalent filter approach, and that is why it actually exists - as otherwise there is no benefit to its inclusion.

I would have left this as a comment, but I don't have enough rep - so be gentle with down-voting if I'm wrong

like image 62
stephematician Avatar answered Sep 29 '22 16:09

stephematician