Sample data:
df1 <- data.frame(id=c("A","A","A","A","B","B","B","B"),
year=c(2014,2014,2015,2015),
month=c(1,2),
new.employee=c(4,6,2,6,23,2,5,34))
id year month new.employee
1 A 2014 1 4
2 A 2014 2 6
3 A 2015 1 2
4 A 2015 2 6
5 B 2014 1 23
6 B 2014 2 2
7 B 2015 1 5
8 B 2015 2 34
Desired outcome:
desired_df <- data.frame(id=c("A","A","A","A","B","B","B","B"),
year=c(2014,2014,2015,2015),
month=c(1,2),
new.employee=c(4,6,2,6,23,2,5,34),
new.employee.rank=c(1,1,2,2,2,2,1,1))
id year month new.employee new.employee.rank
1 A 2014 1 4 1
2 A 2014 2 6 1
3 A 2015 1 2 2
4 A 2015 2 6 2
5 B 2014 1 23 2
6 B 2014 2 2 2
7 B 2015 1 5 1
8 B 2015 2 34 1
The ranking rule is: I choose month 2 in each year to rank number of new employees between A and B. Then I need to give those ranks to month 1. i.e., month 1 of each year rankings must be equal to month 2 ranking in the same year.
I tried these code to get rankings for each month and each year,
library(data.table)
df1 <- data.table(df1)
df1[,rank:=rank(new.employee), by=c("year","month")]
If (anyone can roll the rank value within a column to replace rank of month 1 by rank of month 2 ), it might be a solution.
You've tried a data.table
solution, so here's how would I do this using data.table
library(data.table) # V1.9.6+
temp <- setDT(df1)[month == 2L, .(id, frank(-new.employee)), by = year]
df1[temp, new.employee.rank := i.V2, on = c("year", "id")]
df1
# id year month new.employee new.employee.rank
# 1: A 2014 1 4 1
# 2: A 2014 2 6 1
# 3: A 2015 1 2 2
# 4: A 2015 2 6 2
# 5: B 2014 1 23 2
# 6: B 2014 2 2 2
# 7: B 2015 1 5 1
# 8: B 2015 2 34 1
It appears somewhat similar to the above dplyr
solution. Which is basically ranks the id
s per year and joins them back to the original data set. I'm using data.table
V1.9.6+ here.
Here's a dplyr
-based solution. The idea is to reduce the data to the parts you want to compare, make the comparison, then join the results back into the original data set, expanding it to fill all of the relevant slots. Note the edits to your code for creating the sample data.
df1 <- data.frame(id=c("A","A","A","A","B","B","B","B"),
year=rep(c(2014,2014,2015,2015), 2),
month=rep(c(1,2), 4),
new.employee=c(4,6,2,6,23,2,5,34))
library(dplyr)
df1 %>%
# Reduce the data to the slices (months) you want to compare
filter(month==2) %>%
# Group the data by year, so the comparisons are within and not across years
group_by(year) %>%
# Create a variable that indicates the rankings within years in descending order
mutate(rank = rank(-new.employee)) %>%
# To prepare for merging, reduce the new data to just that ranking var plus id and year
select(id, year, rank) %>%
# Use left_join to merge the new data (.) with the original df, expanding the
# new data to fill all rows with id-year matches
left_join(df1, .) %>%
# Order the data by id, year, and month to make it easier to review
arrange(id, year, month)
Output:
Joining by: c("id", "year")
id year month new.employee rank
1 A 2014 1 4 1
2 A 2014 2 6 1
3 A 2015 1 2 2
4 A 2015 2 6 2
5 B 2014 1 23 2
6 B 2014 2 2 2
7 B 2015 1 5 1
8 B 2015 2 34 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With