In dplyr 0.5.0, on a grouped data frame, why does slice(1) not give the same row ordering as filter(row_number() == 1)?

Question

I am observing that slice changes the ordering of the rows in some circumstances when group_by is used.

tmp_df2 <- data.frame(a = c(1, 3, 2, 4), b = c(1, 2, 3, 4))

tmp_df2 %>%
    group_by(a) %>%
    slice(1)

gives

Source: local data frame [4 x 2]
Groups: a [4]

      a     b
  <dbl> <dbl>
1     1     1
2     2     3
3     3     2
4     4     4

and

tmp_df2 %>%
    group_by(a) %>%
    filter(row_number() == 1)

gives

Source: local data frame [4 x 2]
Groups: a [4]

      a     b
  <dbl> <dbl>
1     1     1
2     3     2
3     2     3
4     4     4

It looks like slice reeorders the output in ascending order of the grouping variables. However, the documentation suggests that slice and filter should behave in the same way, particularly from ?slice (emphasis mine):

Slice does not work with relational databases because they have no intrinsic notion of row order. If you want to perform the equivalent operation, use filter() and row_number().

stephematician · Accepted Answer

Looking at the code, slice() works by iterating over the groups, and so its output will be in group ordered form. I suspect it is more efficient than the equivalent filter approach, and that is why it actually exists - as otherwise there is no benefit to its inclusion.

I would have left this as a comment, but I don't have enough rep - so be gentle with down-voting if I'm wrong

In dplyr 0.5.0, on a grouped data frame, why does slice(1) not give the same row ordering as filter(row_number() == 1)?

Tags:

r

dplyr

Alex

1 Answers

stephematician

Recent Activity

Donate For Us

In dplyr 0.5.0, on a grouped data frame, why does slice(1) not give the same row ordering as filter(row_number() == 1)?

Tags:

r

dplyr

Alex

1 Answers

stephematician

Related questions

Recent Activity

Donate For Us