I have a dataset with over 14000 observations and 43 variables. The data was collected across 11 countries and for two of the questions, participants were asked different variations of the same question based on the country they were in, meaning that for 2 variables I actually have 22 columns. Basically, here is an example of what the df looks like:
df <- data-frame(country = c(1, 1, 1, 2, 2, 2, 3, 3, 3), Q1_UK = c(1, 2, 2, NA, NA, NA, NA, NA, NA), Q1_FR = c(NA, NA, NA, 2, 1, 2, NA, NA, NA), Q1_ES = c(NA, NA, NA, NA, NA, NA, 2, 2, 1), Q2_UK = c(1, 1, 2, NA, NA, NA, NA, NA, NA), Q2_FR = c(NA, NA, NA, 1, 2, 2, NA, NA, NA), Q2_ES = c(NA, NA, NA, NA, NA, NA, 1, 2, 1))
country Q1_UK Q1_FR Q1_ES Q2_UK Q2_FR Q2_ES
1 1 1 NA NA 1 NA NA
2 1 2 NA NA 1 NA NA
3 1 2 NA NA 2 NA NA
4 2 NA 2 NA NA 1 NA
5 2 NA 1 NA NA 2 NA
6 2 NA 2 NA NA 2 NA
7 3 NA NA 2 NA NA 1
8 3 NA NA 2 NA NA 2
9 3 NA NA 1 NA NA 1
and so on...
I want to have 2 single variables containing all responses for different countries - with an end result like this:
country Q1 Q2
1 1 1 1
2 1 2 1
3 1 2 2
4 2 2 1
5 2 1 2
6 2 2 2
7 3 2 1
8 3 2 2
9 3 1 1
I was thinking that rotating the dataframe, using fill(), and then rotating again might work but I was not too sure how to go about it and how to make sure that the answers are only filled in by question and not across variables. I am really new to R and I am exhausted so I might just be missing something obvious.
This could be done with pivot_longer
library(tidyr)
pivot_longer(df, cols = -country, names_to = c(".value"),
names_pattern = "(.*)_.*", values_drop_na = TRUE)
-output
A tibble: 9 × 3
country Q1 Q2
<int> <int> <int>
1 1 1 1
2 1 2 1
3 1 2 2
4 2 2 1
5 2 1 2
6 2 2 2
7 3 2 1
8 3 2 2
9 3 1 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With