I have a wide dataframe that I need to gather or melt into a tall dataframe. The part that I'm stuck on is that I have groups of columns that need to remain associated/grouped.
I have 2 users for each form submission and 3 columns of data for each user. I'd like to take these 6 columns and essentially stack them in groups of 3 so that each user is a separate observation.
This is a sample of more or less what my data looks like:
wide <- data.frame(
form.ID = c(1, 2),
entry.date = c("2016-07-01", "2016-06-15"),
user.1 = c("Joe", "Sam"),
user.1.ID = c("A1", "A2"),
user.1.data = c("foo", "lorem"),
user.2 = c("Jane", "Sue"),
user.2.ID = c("B1", "B2"),
user.2.data = c("bar", "ipsum")
)
wide
# form.ID entry.date user.1 user.1.ID user.1.data user.2 user.2.ID user.2.data
# 1 1 2016-07-01 Joe A1 foo Jane B1 bar
# 2 2 2016-06-15 Sam A2 lorem Sue B2 ipsum
This is the desired end state:
# form.ID entry.date user user.ID user.data
# 1 1 2016-07-01 Joe A1 foo
# 1 1 2016-07-01 Jane B1 bar
# 2 2 2016-06-15 Sam A2 lorem
# 2 2 2016-06-15 Sue B2 ipsum
I found this question, but I can't get the given answers to work in my case:
Gather multiple sets of columns
I tried:
tall.almost <- gather(wide, user.n, user.name, user.1, user.2)
tall.almost
# form.ID entry.date user.1.ID user.1.data user.2.ID user.2.data user.n user.name
# 1 1 2016-07-01 A1 foo B1 bar user.1 Joe
# 2 2 2016-06-15 A2 ipsum B2 lorem user.1 Sam
# 3 1 2016-07-01 A1 foo B1 bar user.2 Jane
# 4 2 2016-06-15 A2 ipsum B2 lorem user.2 Sue
I thought to use a sequence of gather() functions like the one above, but I get a duplicate data.
I tried:
tall.not.quite <- gather(wide, user.n, user.name, -form.ID, -date)
tall.not.quite
form.ID entry.date user.n user.name
1 1 2016-07-01 user.1 Joe
2 2 2016-06-15 user.1 Sam
3 1 2016-07-01 user.1.ID A1
4 2 2016-06-15 user.1.ID A2
5 1 2016-07-01 user.1.data foo
6 2 2016-06-15 user.1.data ipsum
7 1 2016-07-01 user.2 Jane
8 2 2016-06-15 user.2 Sue
9 1 2016-07-01 user.2.ID B1
10 2 2016-06-15 user.2.ID B2
11 1 2016-07-01 user.2.data bar
12 2 2016-06-15 user.2.data lorem
thinking I could then use spread() to pull out the user.n.ID and user.n.data fields, but I can't get that to work either. I end up back where I started.
I'm pretty good and stuck. This R newby would really appreciate any help.
Thanks!
We can use melt
from data.table
which can take multiple measure
columns.
library(data.table)
melt(setDT(wide), measure = patterns("\\d+$", "user.*ID$", "data$"),
value.name = c("user", "user.ID", "user.data"))[,
variable:= NULL][order(form.ID)]
# form.ID entry.date user user.ID user.data
# 1: 1 2016-07-01 Joe A1 foo
# 2: 1 2016-07-01 Jane B1 bar
# 3: 2 2016-06-15 Sam A2 lorem
# 4: 2 2016-06-15 Sue B2 ipsum
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With