Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parse values based on groups in R

I have a very large dataset and a sample of that looks something like the one below:

| Id | Name    | Start_Date | End_Date   |
|----|---------|------------|------------|
| 10 | Mark    | 4/2/1999   | 7/5/2018   |
| 10 |         | 1/1/2000   | 9/24/2018  |
| 25 |         | 5/3/1968   | 6/3/2000   |
| 25 |         | 6/6/2009   | 4/23/2010  |
| 25 | Anthony | 2/20/2010  | 7/21/2016  |
| 25 |         | 9/12/2014  | 11/26/2019 |

I need to parse the names from Name column based on their Id such that the output table looks like:

| Id | Name    | Start_Date | End_Date   |
|----|---------|------------|------------|
| 10 | Mark    | 4/2/1999   | 7/5/2018   |
| 10 | Mark    | 1/1/2000   | 9/24/2018  |
| 25 | Anthony | 5/3/1968   | 6/3/2000   |
| 25 | Antony  | 6/6/2009   | 4/23/2010  |
| 25 | Anthony | 2/20/2010  | 7/21/2016  |
| 25 | Anthony | 9/12/2014  | 11/26/2019 |

How can I achieve an output as shown above? I went through the substitute and parse functions, but was unable to understand how they apply to this problem.

My dataset would be:

df=data.frame(Id=c("10","10","25","25","25","25"),Name=c("Mark","","","","Anthony",""),
              Start_Date=c("4/2/1999", "1/1/2000","5/3/1968","6/6/2009","2/20/2010","9/12/2014"),
              End_Date=c("7/5/2018","9/24/2018","6/3/2000","4/23/2010","7/21/2016","11/26/2019"))
like image 873
hk2 Avatar asked Jan 26 '23 22:01

hk2


1 Answers

We can change the blanks ("") to NA and use fill to replace the NA elements with the previous non-NA element

library(dplyr)
library(tidyr)
df1 %>%      
   mutate(Name = na_if(Name, "")) %>%
   group_by(Id) %>%
   fill(Name, .direction = "down") %>%
   fill(Name, .direction = "up)
# A tibble: 6 x 4
# Groups:   Id [2]
#  Id    Name    Start_Date End_Date  
#  <chr> <chr>   <chr>      <chr>     
#1 10    Mark    4/2/1999   7/5/2018  
#2 10    Mark    1/1/2000   9/24/2018 
#3 25    Anthony 5/3/1968   6/3/2000  
#4 25    Anthony 6/6/2009   4/23/2010 
#5 25    Anthony 2/20/2010  7/21/2016 
#6 25    Anthony 9/12/2014  11/26/2019

In the devel version of tidyr (‘0.8.3.9000’), this can be done in a single fill statement as .direction = "downup" is also an option

df1 %>%      
   mutate(Name = na_if(Name, "")) %>%
   group_by(Id) %>%
   fill(Name, .direction = "downup") 

Or another option is to group by 'Id', and mutate the 'Name' as the first non-blank element

df1 %>%
    group_by(Id) %>%        
    mutate(Name = first(Name[Name!=""])) 
# A tibble: 6 x 4
# Groups:   Id [2]
#  Id    Name    Start_Date End_Date  
#  <chr> <chr>   <chr>      <chr>     
#1 10    Mark    4/2/1999   7/5/2018  
#2 10    Mark    1/1/2000   9/24/2018 
#3 25    Anthony 5/3/1968   6/3/2000  
#4 25    Anthony 6/6/2009   4/23/2010 
#5 25    Anthony 2/20/2010  7/21/2016 
#6 25    Anthony 9/12/2014  11/26/2019

data

df1 <- structure(list(Id = c("10", "10", "25", "25", "25", "25"), Name = c("Mark", 
"", "", "", "Anthony", ""), Start_Date = c("4/2/1999", "1/1/2000", 
"5/3/1968", "6/6/2009", "2/20/2010", "9/12/2014"), End_Date = c("7/5/2018", 
"9/24/2018", "6/3/2000", "4/23/2010", "7/21/2016", "11/26/2019"
)), class = "data.frame", row.names = c(NA, -6L))
like image 161
akrun Avatar answered Jan 28 '23 10:01

akrun