I want to tidy a dataset by manipulating the data this way:
age gender education previous_comp_exp tutorial_time qID.1 time_taken.1 qID.2 time_taken.2
18 Male Undergraduate casual gamer 62.17926 sor9 39.61206 sor8 19.4892
24 Male Undergraduate casual gamer 85.01288 sor9 50.92343 sor8 16.15616
To become this:
age gender education previous_comp_exp tutorial_time qID time_taken
18 Male Undergraduate casual gamer 62.17926 sor9 39.61206
18 Male Undergraduate casual gamer 62.17926 sor8 19.4892
24 Male Undergraduate casual gamer 85.01288 sor9 50.92343
24 Male Undergraduate casual gamer 85.01288 sor8 16.15616
I've experimented with gather()
, but I can only only get it working with one column and I keep getting this warning:
Warning message:attributes are not identical across measure variables; they will be dropped
Any ideas?
With melt
from data.table
(see ?patterns
):
library(data.table)
melt(setDT(df), measure = patterns("^qID", "^time_taken"),
value.name = c("qID", "time_taken"))
Result:
age gender education previous_comp_exp tutorial_time variable qID time_taken
1: 18 Male Undergraduate casual_gamer 62.17926 1 sor9 39.61206
2: 24 Male Undergraduate casual_gamer 85.01288 1 sor9 50.92343
3: 18 Male Undergraduate casual_gamer 62.17926 2 sor8 19.48920
4: 24 Male Undergraduate casual_gamer 85.01288 2 sor8 16.15616
or with tidyr
:
library(dplyr)
library(tidyr)
df %>%
gather(variable, value, qID.1:time_taken.2) %>%
mutate(variable = sub("\\.\\d$", "", variable)) %>%
group_by(variable) %>%
mutate(ID = row_number()) %>%
spread(variable, value, convert = TRUE) %>%
select(-ID)
Result:
# A tibble: 4 x 7
age gender education previous_comp_exp tutorial_time qID time_taken
<int> <fctr> <fctr> <fctr> <dbl> <chr> <dbl>
1 18 Male Undergraduate casual_gamer 62.17926 sor9 39.61206
2 18 Male Undergraduate casual_gamer 62.17926 sor8 19.48920
3 24 Male Undergraduate casual_gamer 85.01288 sor9 50.92343
4 24 Male Undergraduate casual_gamer 85.01288 sor8 16.15616
Note:
For the tidyr
method, convert=TRUE
is used to convert time_taken
back to numeric
, since it was coerced to character when gather
ed with the qID
columns.
Data:
df = structure(list(age = c(18L, 24L), gender = structure(c(1L, 1L
), .Label = "Male", class = "factor"), education = structure(c(1L,
1L), .Label = "Undergraduate", class = "factor"), previous_comp_exp = structure(c(1L,
1L), .Label = "casual_gamer", class = "factor"), tutorial_time = c(62.17926,
85.01288), qID.1 = structure(c(1L, 1L), .Label = "sor9", class = "factor"),
time_taken.1 = c(39.61206, 50.92343), qID.2 = structure(c(1L,
1L), .Label = "sor8", class = "factor"), time_taken.2 = c(19.4892,
16.15616)), .Names = c("age", "gender", "education", "previous_comp_exp",
"tutorial_time", "qID.1", "time_taken.1", "qID.2", "time_taken.2"
), class = "data.frame", row.names = c(NA, -2L))
In base R you can use the powerful reshape
to transform your data from wide to long format in one-line statement:
reshape(dx,direction="long",
varying=list(grep("qID",colnames(dx)),
grep("time_taken",colnames(dx))),
v.names=c("qID","time_taken"))
age gender education previous_comp_exp tutorial_time time qID time_taken id
1.1 18 Male Undergraduate casual_gamer 62.17926 1 sor9 39.61206 1
2.1 24 Male Undergraduate casual_gamer 85.01288 1 sor9 50.92343 2
1.2 18 Male Undergraduate casual_gamer 62.17926 2 sor8 19.48920 1
2.2 24 Male Undergraduate casual_gamer 85.01288 2 sor8 16.15616 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With