Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - Copy values within a group

Tags:

copy

r

replicate

I have a dataframe where I have the total number of points someone scored in the past 3 years (2016, 2017, 2018), but also columns with their number of points per year.

My dataframe looks like this:

myDF <- data.frame(ID =c(1,1,1,2,2,3,4),
 Dates= c("2016", "2017", "2018", "2016", "2017", "2018", "2016"),
 Total_Points = c(5, 5, 5, 4, 4, 2, 3),
 Points2016 = c(3, NA, NA, 2, NA, NA, 3),
 Points2017 = c(NA,1,NA,NA,2,NA,NA),
 Points2018= c(NA,NA,1, NA, NA, 2, NA))

The problem is that I would like to copy the values of columns "Points2016", "Points2017" and "Points2017" for every group so that their entries look the same.

I'm not sure the explanation was clear so this would be my expected output:

myDF_final <- data.frame(ID =c(1,1,1,2,2,3,4),
               Dates= c("2016", "2017", "2018", "2016", "2017", "2018", "2016"),
               Total_Points = c(5, 5, 5, 4, 4, 2, 3),
               Points2016 = c(3, 3, 3, 2, 2, NA, 3),
               Points2017 = c(1,1,1,2,2,NA,NA),
               Points2018= c(1,1,1, NA, NA, 2, NA))

Basically, I would like to have the same values for the columns "Points201X" for every ID.

like image 796
AllanLC Avatar asked Dec 08 '22 15:12

AllanLC


2 Answers

I think you could just fill by the ID group in both directions. With dplyr and tidyr we could do:

library(dplyr)
library(tidyr)

myDF %>% 
  group_by(ID) %>% 
  fill(Points2016, Points2017, Points2018) %>% 
  fill(Points2016, Points2017, Points2018, .direction = "up")

Returns:

  ID Dates Total_Points Points2016 Points2017 Points2018
1  1  2016            5          3          1          1
2  1  2017            5          3          1          1
3  1  2018            5          3          1          1
4  2  2016            4          2          2         NA
5  2  2017            4          2          2         NA
6  3  2018            2         NA         NA          2
7  4  2016            3          3         NA         NA

Also, if you have a bunch of years say 1970 - 2018, you could do something like:

myDF %>% 
  gather(points_year, points, -c(ID, Dates, Total_Points)) %>% 
  group_by(ID, points_year) %>% 
  fill(points) %>% 
  fill(points, .direction = "up") %>% 
  spread(points_year, points)

So as to avoid typing out every year. However, this involves gathering and spreading the data which might be unnecessary assuming the variables we need to fill follow a consistent naming convention. In this case, there is a consistent naming convention and we could use the tidyselect backend of dplyr to fill all variables that start with the word "Points":

myDF %>% 
  group_by(ID) %>% 
  fill(starts_with("Points"), .direction = "down") %>% 
  fill(starts_with("Points"), .direction = "up")

Alternatively, this seems to work with data.table and zoo:

library(data.table)
library(zoo)

dt <- as.data.table(myDF)

dt <- dt[, names(dt)[4:6] := lapply(.SD, function(x) na.locf0(x)), by = ID, .SDcols = 4:6]
dt <- dt[, names(dt)[4:6] := lapply(.SD, function(x) na.locf0(x, fromLast = TRUE)), by = ID, .SDcols = 4:6]

This one liner seems to do it all in one go as well:

dt[, names(dt)[4:6] := lapply(.SD, function(x) na.locf(x)), by = ID, .SDcols = 4:6]
   ID Dates Total_Points Points2016 Points2017 Points2018
1:  1  2016            5          3          1          1
2:  1  2017            5          3          1          1
3:  1  2018            5          3          1          1
4:  2  2016            4          2          2         NA
5:  2  2017            4          2          2         NA
6:  3  2018            2         NA         NA          2
7:  4  2016            3          3         NA         NA
like image 115
tyluRp Avatar answered Dec 21 '22 23:12

tyluRp


You could also use zoo::na.locf0 to fill NAs from the top & bottom.

library(tidyverse);
library(zoo);
myDF %>%
    group_by(ID) %>%
    mutate_at(vars(contains("Points20")), funs(na.locf0(., fromLast = F))) %>%
    mutate_at(vars(contains("Points20")), funs(na.locf0(., fromLast = T)))
## A tibble: 7 x 6
## Groups:   ID [4]
#     ID Dates Total_Points Points2016 Points2017 Points2018
#  <dbl> <fct>        <dbl>      <dbl>      <dbl>      <dbl>
#1    1. 2016            5.         3.         1.         1.
#2    1. 2017            5.         3.         1.         1.
#3    1. 2018            5.         3.         1.         1.
#4    2. 2016            4.         2.         2.        NA
#5    2. 2017            4.         2.         2.        NA
#6    3. 2018            2.        NA         NA          2.
#7    4. 2016            3.         3.        NA         NA
like image 22
Maurits Evers Avatar answered Dec 21 '22 23:12

Maurits Evers