I have a dataset from a survey. The setup of the survey was as follows: Q1 was a multiple choice question, and in Q2 answers are given for each of the selected choices in Q1 (same for Q3 up to Q9). After exporting and uploading to R, the dataframe is as follows:
df <- data.frame(
ID = 1:3,
Q1 = c("Purple, Red, Blue", "Red, Blue, Yellow", "Red"),
"Q2: Is it your favorite color? - Purple" = c("Average", NA, NA),
"Q2: Is it your favorite color? - Red" = c("No", "No", "Average"),
"Q2: Is it your favorite color? - Blue" = c("Yes", "No", NA),
"Q2: Is it your favorite color? - Yellow" = c(NA, "Average", NA)
)
What I'm looking to do is to transform so that the dataframe is a long format like below
| ID | Q1 | Q2 |
|---|---|---|
| 1 | Purple | Average |
| 1 | Red | No |
| 1 | Blue | Yes |
| 2 | Red | No |
| 2 | Blue | No |
| 2 | Yellow | Average |
| 3 | Red | Average |
I was thinking to first separate the answers to Q1 into rows, and then pivot_longer the Q2 columns based on the selected option in Q2.
df1 <- df %>%
separate_rows(Q1, sep = ",")
df2 <- df1 %>%
pivot_longer(cols = matches(Q1),
names_to = Q2,
values_to = Q2_answer)
But this doesn't work (error in pivot_longer: selections can't have missing values).
What am I doing wrong? Or is there another way to do this?
Probably this could help
df %>%
select(-Q1) %>%
pivot_longer(-ID, names_to = "Q1", values_to = "Q2") %>%
na.omit() %>%
mutate(Q1 = str_extract(Q1, "\\w+$"))
or with values_drop_na = TRUE (as suggested by M-- in the comment)
df %>%
select(-Q1) %>%
pivot_longer(-ID, names_to = "Q1", values_to = "Q2", values_drop_na = TRUE) %>%
mutate(Q1 = str_extract(Q1, "\\w+$"))
which gives
# A tibble: 7 × 3
ID Q1 Q2
<int> <chr> <chr>
1 1 Purple Average
2 1 Red No
3 1 Blue Yes
4 2 Red No
5 2 Blue No
6 2 Yellow Average
7 3 Red Average
Try this:
library(dplyr)
library(tidyr)
pivot_longer(df, cols = -c(ID, Q1), values_to = "Q2") |>
separate_longer_delim(Q1, delim = ",") |>
mutate(Q1 = trimws(Q1)) |>
filter(sub(".* ", "", name) == Q1) |>
select(-name)
# # A tibble: 7 × 3
# ID Q1 Q2
# <int> <chr> <chr>
# 1 1 Purple Average
# 2 1 Red No
# 3 1 Blue Yes
# 4 2 Red No
# 5 2 Blue No
# 6 2 Yellow Average
# 7 3 Red Average
Data, adding check.names=:
df <- data.frame(
ID = 1:3,
Q1 = c("Purple, Red, Blue", "Red, Blue, Yellow", "Red"),
"Q2: Is it your favorite color? - Purple" = c("Average", NA, NA),
"Q2: Is it your favorite color? - Red" = c("No", "No", "Average"),
"Q2: Is it your favorite color? - Blue" = c("Yes", "No", NA),
"Q2: Is it your favorite color? - Yellow" = c(NA, "Average", NA),
check.names = FALSE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With