# Sample dataframe
set.seed(123)
d = data.frame(x = runif(120), grp = gl(3, 40))
# Select top_n
d %>%
group_by(grp) %>%
top_n(n=3, wt=x)
How do I select both top and bottom observations within the same pipe? Have tried the following but does not work
# helper function
my_top_bott = function(x, n, wt) {
x1 = x %>% top_n(n=n, wt=wt)
x2 = x %>% top_n(n=n, wt=-wt)
x = bind_rows(x1, x2)
return(x)
}
# Pipe
d %>%
group_by(grp) %>%
my_top_bott(., n=3, wt=x)
One possibility could be:
d %>%
group_by(grp) %>%
filter(dense_rank(x) <= 3 | dense_rank(desc(x)) <= 3)
x grp
<dbl> <fct>
1 0.0456 1
2 0.957 1
3 0.0421 1
4 0.994 1
5 0.963 1
6 0.0246 1
7 0.858 2
8 0.0458 2
9 0.895 2
10 0.0948 2
11 0.815 2
12 0.000625 2
13 0.103 3
14 0.985 3
15 0.0936 3
16 0.954 3
17 0.0607 3
18 0.954 3
Or a possibility proposed by @IceCreamToucan:
d %>%
group_by(grp) %>%
filter(!between(dense_rank(x), 3 + 1, n() - 3))
Or a possibility involving match()
:
d %>%
group_by(grp) %>%
filter(!is.na(x[match(x, sort(x)[c(1:3, (n()-2):n())])]))
You could also use the row_number()
.
d %>%
group_by(grp) %>%
arrange(desc(x)) %>%
filter(row_number() > max(row_number()) - 3 | row_number() <= 3)
x grp
<dbl> <fct>
1 0.995 2
2 0.975 2
3 0.975 1
4 0.974 3
5 0.974 3
6 0.960 1
7 0.960 3
8 0.951 2
9 0.874 1
10 0.127 2
11 0.104 2
12 0.0693 1
13 0.0520 1
14 0.0279 2
15 0.0146 3
16 0.0114 3
17 0.00864 1
18 0.00333 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With