I conducted a study that, in retrospect (one lives, one learns :-)) appears to generate multilevel data. Now I'm trying to restructure the dataset from wide to long so that I can analyse it using e.g. lme4.
In doing so, I encounter an, um, challenge, that I've ran into a few times before, but for which I've never found a good solution. I've searched again this time, but I probably use the wrong keywords - or this problem is much rarer than I thought.
Basically, in this dataset, the variablenames indicate for which measure data is collected. I asked participants to grade (rate) interventions (could be anything really). Each intervention is in one of 6 behavioral domains. In addition, participants rated each intervention either when it was presented on its own, or simultaneously with one other intervention, or with two other interventions. There were three types of interventions, and they were all rated before (t0) and after (t1) I presented them with some information.
So, in effect, I have a dataframe that can be regenerated like this:
### Elements of the variable names
measurementMomentsVector <- c("t0", "t1");
interventionTypesVector <- c("fear", "know", "scd");
nrOfInterventionsSimultaneouslyVector <- c(1, 2, 3);
behaviorDomainsVector <- c("diet", "pox", "alc", "smoking", "traff", "adh");
### Generate a vector with all variable names
variableNames <-
apply(expand.grid(measurementMomentsVector,
interventionTypesVector,
nrOfInterventionsSimultaneouslyVector,
behaviorDomainsVector),
1, paste0, collapse="_");
### Generate 5 'participants' worth of data
wideData <- data.frame(matrix(rnorm(5*length(variableNames)), nrow=5));
### Assign names
names(wideData) <- variableNames;
### Add unique id variable for every participants
wideData$id <- 1:5;
So using head(wideData)[, 1:5]
you can see roughly what the dataframe looks like:
t0_fear_1_diet t1_fear_1_diet t0_know_1_diet t1_know_1_diet t0_scd_1_diet
1 -0.9338191 0.9747453 1.0069036 0.3500103 -0.844699708
2 0.8921867 1.3687834 -1.2005791 0.2747955 1.316768219
3 1.6200200 0.5245470 -1.2910586 1.3211912 -0.174795144
4 0.1543738 0.7535642 0.4726131 -0.3464789 -0.009190702
5 -1.3676692 -0.4491574 -2.0902003 -0.3484678 -2.537501824
Now, I want to convert this data to a long dataframe, with 6 variables, for example 'id', 'measurementMoment', 'interventionType', 'nrOfInterventionsSimultaneously', 'behaviorDomain', and 'evaluation', where the first variable denotes the participants to which a record belongs, the last variable is the score (rating, grade, evaluation) the participants gave a specific intervention, and the four variables in between indicate which intervention is being rated exactly.
I can probably write some 'custom' code just for this problem, but I expect R 'has something for this'. I've been playing around with reshape2, e.g.:
longData <- reshape(wideData, varying=1:(ncol(wideData)-1),
idvar="id",
sep="_", direction="long")
But it doesn't manage to guess the time-varying variables:
Error in guess(varying) :
failed to guess time-varying variables from their names
I have been struggling with this a few times now, and I don't manage to find any answers online. And now I really need to move on, so I thought I'd try this as a last effort before resorting to writing something custom-made :-)
I would greatly appreciate any pointers anybody can give!!!
The easiest way to reshape data between these formats is to use the following two functions from the tidyr package in R: pivot_longer(): Reshapes a data frame from wide to long format. pivot_wider(): Reshapes a data frame from long to wide format.
To reshape the dataframe from long to wide in Pandas, we can use Pandas' pd. pivot() method.
A wide format contains values that do not repeat in the first column. A long format contains values that do repeat in the first column.
I think your problem can be solved with a two step approach:
data.frame
(or as I did, in a long data.table
) variable
column with all the labels into separate columns for each required grouping variable. As the information for this is in the labels, this can easily be achieved with the tstrsplit
function from the data.table
package.
This is what you might be looking for:
library(data.table)
longData <- melt(setDT(wideData), id.vars="id")
longData[, c("moment", "intervention", "number", "behavior") :=
tstrsplit(variable, "_", type.convert = TRUE)
][, variable:=NULL]
the result:
> head(longData,15)
id value moment intervention number behavior
1: 1 -0.07747254 t0 fear 1 diet
2: 2 -0.76207379 t0 fear 1 diet
3: 3 1.15501244 t0 fear 1 diet
4: 4 1.24792369 t0 fear 1 diet
5: 5 -0.28226121 t0 fear 1 diet
6: 1 -1.04875354 t1 fear 1 diet
7: 2 -0.91436882 t1 fear 1 diet
8: 3 0.72863487 t1 fear 1 diet
9: 4 0.10934261 t1 fear 1 diet
10: 5 -0.06093002 t1 fear 1 diet
11: 1 -0.70725760 t0 know 1 diet
12: 2 1.06309003 t0 know 1 diet
13: 3 0.89501164 t0 know 1 diet
14: 4 1.48148316 t0 know 1 diet
15: 5 0.22086835 t0 know 1 diet
As an alternative to data.table
, you can also split the variable
column with the cSplit
function of the splitstackshape
package (you will have to rename the resulting variable columns afterwards though):
library(splitstackshape)
longData <- cSplit(longData, sep="_", "variable", "wide", type.convert=TRUE)
names(longData) <- c("id","value","moment","intervention","number","behavior")
or with tidyr
:
library(tidyr)
separate(longData, variable, c("moment", "intervention", "number", "behavior"), sep="_", remove=TRUE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With