Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tidyverse pivot_longer several sets of columns, but avoid intermediate mutate_wider steps [duplicate]

Tags:

r

pivot

tidyr

I have the following data

dat <- data.frame(id         = c("A", "B", "C"),
                  Q1r1_pepsi = c(1,0,1),
                  Q1r1_cola  = c(0,0,1),
                  Q1r2_pepsi = c(1,1,1),
                  Q1r2_cola  = c(0,1,1),
                  stringsAsFactors = FALSE)

where Q1r1 and Q1r2 are rating questions in a survey and pepsi and cola are the brands being rated. So I have two ratings (r1 and r2) for two brands (pepsi, cola):

id      Q1r1_c1    Q1r1_c2    Q1r2_c1    Q1r2_c2
"A"     1          0          1          0
"B"     0          0          1          1
"C"     1          1          1          1

(Side question: how do I format a SO post so that it correctly contains the nicely formatted output that I would get when calling dat in the R Console?)

To analyze the data I need to reshape (pivot) the data so that rows indicate unique rating-brand pairs. Thus, the expected outcome would be:

id      brand   Q1r1    Q1r2
"A"     "pepsi" 1       1
"A"     "cola"  0       0
"B"     "pepsi" 0       1
"B"     "cola"  0       1
"C"     "pepsi" 1       1
"C"     "cola"  1       1

Currently, I always do a combination of pivot_longer and pivot_wider, but I was hoping that I can directly get this result by pivoting_longer without doing the intermediate step:

library(tidyverse)

dat_long <- dat %>%
  pivot_longer(cols = starts_with("Q1")) %>%
  separate(name, into = c("item", "brand"), remove = FALSE)

dat_wide <- dat_long %>%
  pivot_wider(id_cols = c(id, brand),
              names_from = item,
              values_from = value)

With this current example it's still ok to do this intermediate step, but it gets tiresome in other less clean examples, e.g. suppose my columns weren't named in a nice structure with Q1r1_c1, Q1r1_c2, Q1r2_c1, Q1r2_c2, but instead would be Q4, Q5, Q8r1, Q8r2 where the map would be between Q4 and Q8r1, and Q5/Q8r2, respectively.

like image 582
deschen Avatar asked Dec 17 '25 19:12

deschen


2 Answers

You can use :

tidyr::pivot_longer(dat, cols = -id, 
                   names_to = c('.value', 'brand'), 
                   names_sep = "_")


#  id    brand  Q1r1  Q1r2
#  <chr> <chr> <dbl> <dbl>
#1 A     pepsi     1     1
#2 A     cola      0     0
#3 B     pepsi     0     1
#4 B     cola      0     1
#5 C     pepsi     1     1
#6 C     cola      1     1
like image 150
Ronak Shah Avatar answered Dec 20 '25 12:12

Ronak Shah


Following @Ronak Shah's suggestion, I'm pasting my code here in case my column names are less structured. I'm building up this code on the dat data I specified in my initial post.

names(dat) <- c("id", "Q4", "Q5", "Q8r1", "Q8r2")

spec <- data.frame(.name  = names(dat)[-1],
                   .value = c("Q4Q5", "Q4Q5", "Q8", "Q8"),
                   brand  = rep(c("pepsi", "cola"), 2),
                   stringsAsFactors = FALSE)

dat_long <- pivot_longer_spec(dat, spec)

This gives essentially the same result as for the data with my structured names, just that the names are different now.

Again, I'm not entirely sure how generalizable this approach is, but it worked in my case.

like image 27
deschen Avatar answered Dec 20 '25 13:12

deschen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!