Transform dataframe to long format based on text

Question

I have a dataset from a survey. The setup of the survey was as follows: Q1 was a multiple choice question, and in Q2 answers are given for each of the selected choices in Q1 (same for Q3 up to Q9). After exporting and uploading to R, the dataframe is as follows:

df <- data.frame(
  ID = 1:3,
  Q1 = c("Purple, Red, Blue", "Red, Blue, Yellow", "Red"),
  "Q2: Is it your favorite color? - Purple" = c("Average", NA, NA),
  "Q2: Is it your favorite color? - Red" = c("No", "No", "Average"),
  "Q2: Is it your favorite color? - Blue" = c("Yes", "No", NA),
  "Q2: Is it your favorite color? - Yellow" = c(NA, "Average", NA)
)

What I'm looking to do is to transform so that the dataframe is a long format like below

ID	Q1	Q2
1	Purple	Average
1	Red	No
1	Blue	Yes
2	Red	No
2	Blue	No
2	Yellow	Average
3	Red	Average

I was thinking to first separate the answers to Q1 into rows, and then pivot_longer the Q2 columns based on the selected option in Q2.

df1 <- df %>% 
  separate_rows(Q1, sep = ",")

df2 <- df1 %>% 
  pivot_longer(cols = matches(Q1),
               names_to = Q2,
               values_to = Q2_answer)

But this doesn't work (error in pivot_longer: selections can't have missing values).

What am I doing wrong? Or is there another way to do this?

ThomasIsCoding · Accepted Answer

Probably this could help

df %>%
  select(-Q1) %>%
  pivot_longer(-ID, names_to = "Q1", values_to = "Q2") %>%
  na.omit() %>%
  mutate(Q1 = str_extract(Q1, "\w+$"))

or with values_drop_na = TRUE (as suggested by M-- in the comment)

df %>%
  select(-Q1) %>%
  pivot_longer(-ID, names_to = "Q1", values_to = "Q2", values_drop_na = TRUE) %>%
  mutate(Q1 = str_extract(Q1, "\w+$"))

which gives

# A tibble: 7 × 3
     ID Q1     Q2     
  <int> <chr>  <chr>
1     1 Purple Average
2     1 Red    No
3     1 Blue   Yes
4     2 Red    No
5     2 Blue   No
6     2 Yellow Average
7     3 Red    Average

r2evans · Answer

Try this:

library(dplyr)
library(tidyr)
pivot_longer(df, cols = -c(ID, Q1), values_to = "Q2") |>
  separate_longer_delim(Q1, delim = ",") |>
  mutate(Q1 = trimws(Q1)) |>
  filter(sub(".* ", "", name) == Q1) |>
  select(-name)
# # A tibble: 7 × 3
#      ID Q1     Q2     
#   <int> <chr>  <chr>  
# 1     1 Purple Average
# 2     1 Red    No     
# 3     1 Blue   Yes    
# 4     2 Red    No     
# 5     2 Blue   No     
# 6     2 Yellow Average
# 7     3 Red    Average

Data, adding check.names=:

df <- data.frame(
  ID = 1:3,
  Q1 = c("Purple, Red, Blue", "Red, Blue, Yellow", "Red"),
  "Q2: Is it your favorite color? - Purple" = c("Average", NA, NA),
  "Q2: Is it your favorite color? - Red" = c("No", "No", "Average"),
  "Q2: Is it your favorite color? - Blue" = c("Yes", "No", NA),
  "Q2: Is it your favorite color? - Yellow" = c(NA, "Average", NA),
  check.names = FALSE)

Transform dataframe to long format based on text

Tags:

r

tidyr

Ekd

2 Answers

ThomasIsCoding

r2evans

Recent Activity

Donate For Us

Transform dataframe to long format based on text

Tags:

r

tidyr

Ekd

2 Answers

ThomasIsCoding

r2evans

Related questions

Recent Activity

Donate For Us