Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Missing prefix in pivot_longer with paired columns

Tags:

r

tidyr

I have a "wide" data.frame with a series of paired columns. My goal is to unpivot this using pivot_longer. There is lot's of help for paired columns with prefixes in all columns. But in my case only half of the pairs has prefixes and I don't know how to define the "names_pattern", if this is even possible.

In the following example all paired columns have a prefix which I can split with the names_pattern attribute of pivot_longer:

df_wide <- tribble(
  ~id , ~f_start , ~d_start , ~f_end , ~d_end ,
  'A' , 'p' , '2018-01-01' , 'p' , '2018-02-01' ,
  'B' , 'i' , '2019-04-01' , 'p' , '2020-01-01' ,
  'C' , 'i' , '2018-06-01' , 'i' , '2019-03-01' ,
  'D' , 'p' , '2019-12-01' , 'p' , '2020-05-01' ,
  'E' , 'p' , '2019-02-01' , 'p' , '2019-05-01' ,
  'F' , 'i' , '2018-04-01' , 'i' , '2018-07-01' ,
)

df_wide %>% 
  pivot_longer(-id ,
               names_to = c('.value' , 'status') ,
               names_pattern = '(.*)_(.*)'
               )

This leads to my expected output:

   id    status f     d         
   <chr> <chr>  <chr> <chr>     
 1 A     start  p     2018-01-01
 2 A     end    p     2018-02-01
 3 B     start  i     2019-04-01
 4 B     end    p     2020-01-01
 5 C     start  i     2018-06-01
 6 C     end    i     2019-03-01
 7 D     start  p     2019-12-01
 8 D     end    p     2020-05-01
 9 E     start  p     2019-02-01
10 E     end    p     2019-05-01
11 F     start  i     2018-04-01
12 F     end    i     2018-07-01

Unfortunately my data.frame looks like this with missing prefix in one half of the pairs.

df_wide<- tribble(
  ~id , ~f_start , ~start , ~f_end , ~end ,
  'A' , 'p' , '2018-01-01' , 'p' , '2018-02-01' ,
  'B' , 'i' , '2019-04-01' , 'p' , '2020-01-01' ,
  'C' , 'i' , '2018-06-01' , 'i' , '2019-03-01' ,
  'D' , 'p' , '2019-12-01' , 'p' , '2020-05-01' ,
  'E' , 'p' , '2019-02-01' , 'p' , '2019-05-01' ,
  'F' , 'i' , '2018-04-01' , 'i' , '2018-07-01' ,
)

Has anyone an idea how to achieve the expected output?

Thanks in advance

like image 605
bthebread Avatar asked May 30 '20 20:05

bthebread


1 Answers

If we can rename the missing prefixes with rename_at, the OP's solution would work

library(dplyr)
library(tidyr)
library(stringr)
df_wide %>%
     rename_at(vars(matches('^(start|end)')), ~ str_c('d_', .)) %>%
     pivot_longer(-id ,
               names_to = c('.value' , 'status') ,
               names_pattern = '(.*)_(.*)'
               )
# A tibble: 12 x 4
#   id    status f     d         
#   <chr> <chr>  <chr> <chr>     
# 1 A     start  p     2018-01-01
# 2 A     end    p     2018-02-01
# 3 B     start  i     2019-04-01
# 4 B     end    p     2020-01-01
# 5 C     start  i     2018-06-01
# 6 C     end    i     2019-03-01
# 7 D     start  p     2019-12-01
# 8 D     end    p     2020-05-01
# 9 E     start  p     2019-02-01
#10 E     end    p     2019-05-01
#11 F     start  i     2018-04-01
#12 F     end    i     2018-07-01
like image 58
akrun Avatar answered Oct 17 '22 21:10

akrun