Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reorder variable according to conditions

Tags:

dataframe

r

dplyr

I would like to order my dataset according to the production levels in 2018-19. So company_code 3 goes first (5000), then company_code 1 (2000) and then company_code 2 (1000).

I have a dataset like this:

company_code financial_year production
1 2018-19 2000
1 2019-20 2500
1 2020-21 3000
1 2018-21 7500
2 2018-19 1000
2 2019-20 1500
2 2020-21 1000
2 2020-21 3500
3 2018-19 5000
3 2019-20 5500
3 2020-21 4000
3 2018-21 14500

I would like to end up with:

company_code financial_year production
3 2018-19 5000
3 2019-20 5500
3 2020-21 4000
3 2018-21 14500
1 2018-19 2000
1 2019-20 2500
1 2020-21 3000
1 2018-21 7500
2 2018-19 1000
2 2019-20 1500
2 2020-21 1000
2 2020-21 3500

I tried:

dataset <- dataset %>% mutate(COMPANY_CODE = reorder(COMPANY_CODE, -production[financial_year=="2018/19"]))

But this does not work, could anyone help? Many thanks

like image 579
Dan Avatar asked Nov 23 '25 17:11

Dan


2 Answers

One option would be to use a helper "column" where you keep only the values for year 2018/19 and set all other values to 0 and finally use FUN=sum in reorder:

library(dplyr)

dataset %>% 
  mutate(company_code = reorder(company_code, -ifelse(financial_year == "2018-19", production, 0), FUN = sum)) |> 
  arrange(company_code)
#>    company_code financial_year production
#> 1             3        2018-19       5000
#> 2             3        2019-20       5500
#> 3             3        2020-21       4000
#> 4             3        2018-21      14500
#> 5             1        2018-19       2000
#> 6             1        2019-20       2500
#> 7             1        2020-21       3000
#> 8             1        2018-21       7500
#> 9             2        2018-19       1000
#> 10            2        2019-20       1500
#> 11            2        2020-21       1000
#> 12            2        2020-21       3500
like image 162
stefan Avatar answered Nov 26 '25 06:11

stefan


You could create a temporary column recording production where financial_year is '2018-19' for each company, and arrange the data by this column.

library(dplyr)

df %>%
  group_by(company_code) %>%
  mutate(tmp = production[financial_year == '2018-19']) %>%
  ungroup() %>%
  arrange(desc(tmp)) %>%
  select(-tmp)

# # A tibble: 12 × 3
#    company_code financial_year production
#           <dbl> <chr>               <int>
#  1            3 2018-19              5000
#  2            3 2019-20              5500
#  3            3 2020-21              4000
#  4            3 2018-21             14500
#  5            1 2018-19              2000
#  6            1 2019-20              2500
#  7            1 2020-21              3000
#  8            1 2018-21              7500
#  9            2 2018-19              1000
# 10            2 2019-20              1500
# 11            2 2020-21              1000
# 12            2 2020-21              3500

Refer to @stefan's reorder() solution, there is also a flexible variant fct_reorder2() from forcats to reorder a factor depending on other 2 vectors.

library(forcats)

df %>%
  arrange(fct_reorder2(as.factor(company_code), production, financial_year,
                       .fun = \(x, y) x[y == '2018-19']))
like image 24
Darren Tsai Avatar answered Nov 26 '25 08:11

Darren Tsai



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!