I am observing that slice
changes the ordering of the rows in some circumstances when group_by
is used.
tmp_df2 <- data.frame(a = c(1, 3, 2, 4), b = c(1, 2, 3, 4))
tmp_df2 %>%
group_by(a) %>%
slice(1)
gives
Source: local data frame [4 x 2]
Groups: a [4]
a b
<dbl> <dbl>
1 1 1
2 2 3
3 3 2
4 4 4
and
tmp_df2 %>%
group_by(a) %>%
filter(row_number() == 1)
gives
Source: local data frame [4 x 2]
Groups: a [4]
a b
<dbl> <dbl>
1 1 1
2 3 2
3 2 3
4 4 4
It looks like slice
reeorders the output in ascending order of the grouping variables. However, the documentation suggests that slice
and filter should behave in the same way, particularly from ?slice
(emphasis mine):
Slice does not work with relational databases because they have no intrinsic notion of row order. If you want to perform the equivalent operation, use filter() and row_number().
Looking at the code, slice()
works by iterating over the groups, and so its output will be in group ordered form. I suspect it is more efficient than the equivalent filter
approach, and that is why it actually exists - as otherwise there is no benefit to its inclusion.
I would have left this as a comment, but I don't have enough rep - so be gentle with down-voting if I'm wrong
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With