Given a data set, we can use top_n
to limit the number of rows(ie sort/rank) we get back within the tidyverse
. I love the flexibility of most tidyverse
operations in that they can in most cases be undone ie you can go back to where you started.
Using data and a possible solution(I wrote) from a question on here, how can I best undo a top_n
?.
Data:
df<-structure(list(milk = c(1L, 2L, 1L, 0L, 4L), bread = c(4L, 5L,
2L, 1L, 10L), juice = c(3L, 4L, 6L, 5L, 2L), honey = c(1L, 2L,
0L, 3L, 1L), eggs = c(4L, 4L, 7L, 3L, 5L), beef = c(2L, 3L, 0L,
1L, 8L)), class = "data.frame", row.names = c(NA, -5L))
Code:
df %>%
gather(key,value) %>%
group_by(key) %>%
summarise(Sum=sum(value)) %>%
arrange(desc(Sum)) %>%
top_n(3,Sum) %>%
ungroup()
The above gives me this:
# A tibble: 3 x 2
key Sum
<chr> <int>
1 eggs 23
2 bread 22
3 juice 20
Now what I would (learn how) to do is go back to the original data set without deleting code ie programmatically recover from a top_n
:
Naturally I thought of spreading
(res
is the above result):
spread(res,key,Sum)
# A tibble: 1 x 3
bread eggs juice
<int> <int> <int>
1 22 23 20
However, how to proceed from that or an alternative solution that undoes top_n
just can't come to mind(yet). How can I best achieve this?
Similar idea using pull
but with slightly different approach:
library(tidyverse)
df %>%
summarise_all(sum) %>% # Your method of selecting
gather(key, val) %>% # top three columns
top_n(3) %>% #
arrange(-val) %>% #
pull(key) %>% # pull 'key'
select(df, .) # select cols from df by `.`
# eggs bread juice
#1 4 4 3
#2 4 5 4
#3 7 2 6
#4 3 1 5
#5 5 10 2
And, developing idea from the previous question:
df[, '['(names(sort(colSums(df), T)), 1:3)]
Which gives the same result.
Here's a very dense base R solution:
df[, rank(-colSums(df))[1:3]]
eggs bread juice
1 4 4 3
2 4 5 4
3 7 2 6
4 3 1 5
5 5 10 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With