Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Return original data set after top_n

Tags:

r

dplyr

tidyr

Given a data set, we can use top_n to limit the number of rows(ie sort/rank) we get back within the tidyverse. I love the flexibility of most tidyverse operations in that they can in most cases be undone ie you can go back to where you started.

Using data and a possible solution(I wrote) from a question on here, how can I best undo a top_n?.

Data:

df<-structure(list(milk = c(1L, 2L, 1L, 0L, 4L), bread = c(4L, 5L, 
2L, 1L, 10L), juice = c(3L, 4L, 6L, 5L, 2L), honey = c(1L, 2L, 
0L, 3L, 1L), eggs = c(4L, 4L, 7L, 3L, 5L), beef = c(2L, 3L, 0L, 
1L, 8L)), class = "data.frame", row.names = c(NA, -5L))

Code:

df %>% 
  gather(key,value) %>% 
  group_by(key) %>% 
  summarise(Sum=sum(value)) %>% 
  arrange(desc(Sum)) %>% 
  top_n(3,Sum) %>% 
  ungroup()

The above gives me this:

# A tibble: 3 x 2
  key     Sum
  <chr> <int>
1 eggs     23
2 bread    22
3 juice    20

Now what I would (learn how) to do is go back to the original data set without deleting code ie programmatically recover from a top_n:

Naturally I thought of spreading(res is the above result):

 spread(res,key,Sum)
# A tibble: 1 x 3
  bread  eggs juice
  <int> <int> <int>
1    22    23    20

However, how to proceed from that or an alternative solution that undoes top_n just can't come to mind(yet). How can I best achieve this?

like image 522
NelsonGon Avatar asked Dec 17 '22 16:12

NelsonGon


2 Answers

Similar idea using pull but with slightly different approach:

library(tidyverse)

df %>%
  summarise_all(sum) %>%  # Your method of selecting 
  gather(key, val) %>%    # top three columns 
  top_n(3) %>%            # 
  arrange(-val) %>%       #
  pull(key) %>%           # pull 'key'
  select(df, .)           # select cols from df by `.`

#  eggs bread juice
#1    4     4     3
#2    4     5     4
#3    7     2     6
#4    3     1     5
#5    5    10     2

And, developing idea from the previous question:

df[, '['(names(sort(colSums(df), T)), 1:3)]

Which gives the same result.

like image 119
utubun Avatar answered Dec 29 '22 20:12

utubun


Here's a very dense base R solution:

df[, rank(-colSums(df))[1:3]]
  eggs bread juice
1    4     4     3
2    4     5     4
3    7     2     6
4    3     1     5
5    5    10     2
like image 45
Cole Avatar answered Dec 29 '22 18:12

Cole