Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Choose a month of a year to rank then give resulting ranks to the rest years

Sample data:

df1 <- data.frame(id=c("A","A","A","A","B","B","B","B"),
                        year=c(2014,2014,2015,2015),
                        month=c(1,2),
                        new.employee=c(4,6,2,6,23,2,5,34))

  id year month new.employee
1  A 2014     1            4
2  A 2014     2            6
3  A 2015     1            2
4  A 2015     2            6
5  B 2014     1           23
6  B 2014     2            2
7  B 2015     1            5
8  B 2015     2           34

Desired outcome:

desired_df <- data.frame(id=c("A","A","A","A","B","B","B","B"),
                        year=c(2014,2014,2015,2015),
                        month=c(1,2),
                        new.employee=c(4,6,2,6,23,2,5,34),
                        new.employee.rank=c(1,1,2,2,2,2,1,1))

  id year month new.employee new.employee.rank
1  A 2014     1            4                 1
2  A 2014     2            6                 1
3  A 2015     1            2                 2
4  A 2015     2            6                 2
5  B 2014     1           23                 2
6  B 2014     2            2                 2
7  B 2015     1            5                 1
8  B 2015     2           34                 1

The ranking rule is: I choose month 2 in each year to rank number of new employees between A and B. Then I need to give those ranks to month 1. i.e., month 1 of each year rankings must be equal to month 2 ranking in the same year.

I tried these code to get rankings for each month and each year,

library(data.table)
df1 <- data.table(df1)
df1[,rank:=rank(new.employee), by=c("year","month")]

If (anyone can roll the rank value within a column to replace rank of month 1 by rank of month 2 ), it might be a solution.

like image 819
Pham Cong Minh Avatar asked Mar 15 '23 06:03

Pham Cong Minh


2 Answers

You've tried a data.table solution, so here's how would I do this using data.table

library(data.table) # V1.9.6+
temp <- setDT(df1)[month == 2L, .(id, frank(-new.employee)), by = year]
df1[temp, new.employee.rank := i.V2, on = c("year", "id")]
df1
#    id year month new.employee new.employee.rank
# 1:  A 2014     1            4                 1
# 2:  A 2014     2            6                 1
# 3:  A 2015     1            2                 2
# 4:  A 2015     2            6                 2
# 5:  B 2014     1           23                 2
# 6:  B 2014     2            2                 2
# 7:  B 2015     1            5                 1
# 8:  B 2015     2           34                 1

It appears somewhat similar to the above dplyr solution. Which is basically ranks the ids per year and joins them back to the original data set. I'm using data.table V1.9.6+ here.

like image 121
David Arenburg Avatar answered Mar 16 '23 20:03

David Arenburg


Here's a dplyr-based solution. The idea is to reduce the data to the parts you want to compare, make the comparison, then join the results back into the original data set, expanding it to fill all of the relevant slots. Note the edits to your code for creating the sample data.

df1 <- data.frame(id=c("A","A","A","A","B","B","B","B"),
                        year=rep(c(2014,2014,2015,2015), 2),
                        month=rep(c(1,2), 4),
                        new.employee=c(4,6,2,6,23,2,5,34))

library(dplyr)

df1 %>%
  # Reduce the data to the slices (months) you want to compare
  filter(month==2) %>%
  # Group the data by year, so the comparisons are within and not across years
  group_by(year) %>%
  # Create a variable that indicates the rankings within years in descending order
  mutate(rank = rank(-new.employee)) %>%
  # To prepare for merging, reduce the new data to just that ranking var plus id and year
  select(id, year, rank) %>%
  # Use left_join to merge the new data (.) with the original df, expanding the
  # new data to fill all rows with id-year matches
  left_join(df1, .) %>%
  # Order the data by id, year, and month to make it easier to review
  arrange(id, year, month)

Output:

Joining by: c("id", "year")
  id year month new.employee rank
1  A 2014     1            4    1
2  A 2014     2            6    1
3  A 2015     1            2    2
4  A 2015     2            6    2
5  B 2014     1           23    2
6  B 2014     2            2    2
7  B 2015     1            5    1
8  B 2015     2           34    1
like image 32
ulfelder Avatar answered Mar 16 '23 18:03

ulfelder