I have a dataframe where I have the total number of points someone scored in the past 3 years (2016, 2017, 2018), but also columns with their number of points per year.
My dataframe looks like this:
myDF <- data.frame(ID =c(1,1,1,2,2,3,4),
Dates= c("2016", "2017", "2018", "2016", "2017", "2018", "2016"),
Total_Points = c(5, 5, 5, 4, 4, 2, 3),
Points2016 = c(3, NA, NA, 2, NA, NA, 3),
Points2017 = c(NA,1,NA,NA,2,NA,NA),
Points2018= c(NA,NA,1, NA, NA, 2, NA))
The problem is that I would like to copy the values of columns "Points2016", "Points2017" and "Points2017" for every group so that their entries look the same.
I'm not sure the explanation was clear so this would be my expected output:
myDF_final <- data.frame(ID =c(1,1,1,2,2,3,4),
Dates= c("2016", "2017", "2018", "2016", "2017", "2018", "2016"),
Total_Points = c(5, 5, 5, 4, 4, 2, 3),
Points2016 = c(3, 3, 3, 2, 2, NA, 3),
Points2017 = c(1,1,1,2,2,NA,NA),
Points2018= c(1,1,1, NA, NA, 2, NA))
Basically, I would like to have the same values for the columns "Points201X" for every ID.
I think you could just fill by the ID
group in both directions. With dplyr
and tidyr
we could do:
library(dplyr)
library(tidyr)
myDF %>%
group_by(ID) %>%
fill(Points2016, Points2017, Points2018) %>%
fill(Points2016, Points2017, Points2018, .direction = "up")
Returns:
ID Dates Total_Points Points2016 Points2017 Points2018 1 1 2016 5 3 1 1 2 1 2017 5 3 1 1 3 1 2018 5 3 1 1 4 2 2016 4 2 2 NA 5 2 2017 4 2 2 NA 6 3 2018 2 NA NA 2 7 4 2016 3 3 NA NA
Also, if you have a bunch of years say 1970 - 2018, you could do something like:
myDF %>%
gather(points_year, points, -c(ID, Dates, Total_Points)) %>%
group_by(ID, points_year) %>%
fill(points) %>%
fill(points, .direction = "up") %>%
spread(points_year, points)
So as to avoid typing out every year. However, this involves gathering and spreading the data which might be unnecessary assuming the variables we need to fill
follow a consistent naming convention. In this case, there is a consistent naming convention and we could use the tidyselect
backend of dplyr
to fill all variables that start with the word "Points":
myDF %>%
group_by(ID) %>%
fill(starts_with("Points"), .direction = "down") %>%
fill(starts_with("Points"), .direction = "up")
Alternatively, this seems to work with data.table
and zoo
:
library(data.table)
library(zoo)
dt <- as.data.table(myDF)
dt <- dt[, names(dt)[4:6] := lapply(.SD, function(x) na.locf0(x)), by = ID, .SDcols = 4:6]
dt <- dt[, names(dt)[4:6] := lapply(.SD, function(x) na.locf0(x, fromLast = TRUE)), by = ID, .SDcols = 4:6]
This one liner seems to do it all in one go as well:
dt[, names(dt)[4:6] := lapply(.SD, function(x) na.locf(x)), by = ID, .SDcols = 4:6]
ID Dates Total_Points Points2016 Points2017 Points2018 1: 1 2016 5 3 1 1 2: 1 2017 5 3 1 1 3: 1 2018 5 3 1 1 4: 2 2016 4 2 2 NA 5: 2 2017 4 2 2 NA 6: 3 2018 2 NA NA 2 7: 4 2016 3 3 NA NA
You could also use zoo::na.locf0
to fill NA
s from the top & bottom.
library(tidyverse);
library(zoo);
myDF %>%
group_by(ID) %>%
mutate_at(vars(contains("Points20")), funs(na.locf0(., fromLast = F))) %>%
mutate_at(vars(contains("Points20")), funs(na.locf0(., fromLast = T)))
## A tibble: 7 x 6
## Groups: ID [4]
# ID Dates Total_Points Points2016 Points2017 Points2018
# <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
#1 1. 2016 5. 3. 1. 1.
#2 1. 2017 5. 3. 1. 1.
#3 1. 2018 5. 3. 1. 1.
#4 2. 2016 4. 2. 2. NA
#5 2. 2017 4. 2. 2. NA
#6 3. 2018 2. NA NA 2.
#7 4. 2016 3. 3. NA NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With