I have a data frame like this:
structure(list(one = structure(1:4, .Label = c("a", "b", "c",
"d"), class = "factor"), two = c(2, 4, 7, 3), x.1 = c("x1a",
"x1b", "x1c", "x1d"), x.2 = c("x2a", "x2b", "x2c", "x2d"), x.3 = c("x3a",
"x3b", "x3c", "x3d"), y.1 = c(NA, "y1b", "y1c", NA), y.2 = c(NA,
"y2b", "y2c", NA), y.3 = c(NA, "y3b", "y3c", NA)), .Names = c("one",
"two", "x.1", "x.2", "x.3", "y.1", "y.2", "y.3"), row.names = c(NA,
-4L), class = "data.frame")
As you can see, the observations per event a, b, c, and d (variable "one") are stored as columns, where x and y define separate observations and 1, 2 and 3 define the variables. Variable "two" does not have a meaning here.
I like to reshape this data frame to have it tidy in the form that each observation has it's own row and each variable it's own column.
The final data frame should look like this:
structure(list(one = structure(c(1L, 2L, 2L, 3L, 3L, 4L), .Label = c("a",
"b", "c", "d"), class = "factor"), two = c(2, 4, 2, 7, 5, 3),
var1 = c("x1a", "x1b", "y1b", "x1c", "y1c", "x1d"), var2 = c("x2a",
"x2b", "y2b", "x2c", "y2c", "x2d"), var3 = c("x3a", "x3b",
"y3b", "x3c", "y3c", "x3d")), .Names = c("one", "two", "var1",
"var2", "var3"), row.names = c(1L, 2L, 5L, 3L, 6L, 4L), class = "data.frame")
I am slightly familiar with what the cast and melt function from the reshape packages do, but was not able yet to figure out a way to reshape the DF in a smart way. For now the following provides the sate that I have gotten to:
df.between <- melt(df.in, id.vars=c("one", "two"))
df.between$variable <- gsub("x.|y.", "", df.between$variable)
Now the "variable" column does correctly identify the variable (1, 2 or 3). However, I was not able to cast this into the required form and this solution does not seem to be useful for larger sets of data due to the use of grepl
.
Happy to get a nudge into the right direction here.
We can use melt
from the devel version of data.table
i.e. v1.9.5
, which can handle multiple patterns
for the measure
variables.
library(data.table)
melt(setDT(df1), measure=patterns('.1', '.2', '.3'),
na.rm=TRUE, value.name=paste0('var', 1:3))[, variable:=NULL][order(one)]
# one two var1 var2 var3
#1: a 2 x1a x2a x3a
#2: b 4 x1b x2b x3b
#3: b 4 y1b y2b y3b
#4: c 7 x1c x2c x3c
#5: c 7 y1c y2c y3c
#6: d 3 x1d x2d x3d
EDIT: We don't need c
inside the patterns
and it will also give exact matches (from @Jaap's comments).
melt
from "data.table" will be much faster than the following, but you could also consider merged.stack
from my "splitstackshape" package:
library(splitstackshape)
na.omit(merged.stack(mydf, var.stubs = c(".1", ".2", ".3"),
sep = "var.stubs", atStart = FALSE))
# one two .time_1 .1 .2 .3
# 1: a 2 x x1a x2a x3a
# 2: b 4 x x1b x2b x3b
# 3: b 4 y y1b y2b y3b
# 4: c 7 x x1c x2c x3c
# 5: c 7 y y1c y2c y3c
# 6: d 3 x x1d x2d x3d
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With