How do I convert a wide dataframe to a long dataframe for a multilevel structure with 'quadruple nesting'?

Tags:

I conducted a study that, in retrospect (one lives, one learns :-)) appears to generate multilevel data. Now I'm trying to restructure the dataset from wide to long so that I can analyse it using e.g. lme4.

In doing so, I encounter an, um, challenge, that I've ran into a few times before, but for which I've never found a good solution. I've searched again this time, but I probably use the wrong keywords - or this problem is much rarer than I thought.

Basically, in this dataset, the variablenames indicate for which measure data is collected. I asked participants to grade (rate) interventions (could be anything really). Each intervention is in one of 6 behavioral domains. In addition, participants rated each intervention either when it was presented on its own, or simultaneously with one other intervention, or with two other interventions. There were three types of interventions, and they were all rated before (t0) and after (t1) I presented them with some information.

So, in effect, I have a dataframe that can be regenerated like this:

### Elements of the variable names
measurementMomentsVector <- c("t0", "t1");
interventionTypesVector <- c("fear", "know", "scd");
nrOfInterventionsSimultaneouslyVector <- c(1, 2, 3);
behaviorDomainsVector <- c("diet", "pox", "alc", "smoking", "traff", "adh");

### Generate a vector with all variable names
variableNames <-
  apply(expand.grid(measurementMomentsVector,
                    interventionTypesVector,
                    nrOfInterventionsSimultaneouslyVector,
                    behaviorDomainsVector),
        1, paste0, collapse="_");

### Generate 5 'participants' worth of data
wideData <- data.frame(matrix(rnorm(5*length(variableNames)), nrow=5));

### Assign names
names(wideData) <- variableNames;

### Add unique id variable for every participants
wideData$id <- 1:5;

So using head(wideData)[, 1:5] you can see roughly what the dataframe looks like:

  t0_fear_1_diet t1_fear_1_diet t0_know_1_diet t1_know_1_diet t0_scd_1_diet
1     -0.9338191      0.9747453      1.0069036      0.3500103  -0.844699708
2      0.8921867      1.3687834     -1.2005791      0.2747955   1.316768219
3      1.6200200      0.5245470     -1.2910586      1.3211912  -0.174795144
4      0.1543738      0.7535642      0.4726131     -0.3464789  -0.009190702
5     -1.3676692     -0.4491574     -2.0902003     -0.3484678  -2.537501824

Now, I want to convert this data to a long dataframe, with 6 variables, for example 'id', 'measurementMoment', 'interventionType', 'nrOfInterventionsSimultaneously', 'behaviorDomain', and 'evaluation', where the first variable denotes the participants to which a record belongs, the last variable is the score (rating, grade, evaluation) the participants gave a specific intervention, and the four variables in between indicate which intervention is being rated exactly.

I can probably write some 'custom' code just for this problem, but I expect R 'has something for this'. I've been playing around with reshape2, e.g.:

longData <- reshape(wideData, varying=1:(ncol(wideData)-1),
                    idvar="id",
                    sep="_", direction="long")

But it doesn't manage to guess the time-varying variables:

Error in guess(varying) : 
  failed to guess time-varying variables from their names

I have been struggling with this a few times now, and I don't manage to find any answers online. And now I really need to move on, so I thought I'd try this as a last effort before resorting to writing something custom-made :-)

I would greatly appreciate any pointers anybody can give!!!

793

asked Jul 29 '15 17:07

Matherion

1 Answers

I think your problem can be solved with a two step approach:

melt your data into a long data.frame (or as I did, in a long data.table)
split the variable column with all the labels into separate columns for each required grouping variable.

As the information for this is in the labels, this can easily be achieved with the tstrsplit function from the data.table package.

This is what you might be looking for:

library(data.table)
longData <- melt(setDT(wideData), id.vars="id")
longData[, c("moment", "intervention", "number", "behavior") := 
                tstrsplit(variable, "_", type.convert = TRUE)
       ][, variable:=NULL]

the result:

> head(longData,15)
    id       value moment intervention number behavior
 1:  1 -0.07747254     t0         fear      1     diet
 2:  2 -0.76207379     t0         fear      1     diet
 3:  3  1.15501244     t0         fear      1     diet
 4:  4  1.24792369     t0         fear      1     diet
 5:  5 -0.28226121     t0         fear      1     diet
 6:  1 -1.04875354     t1         fear      1     diet
 7:  2 -0.91436882     t1         fear      1     diet
 8:  3  0.72863487     t1         fear      1     diet
 9:  4  0.10934261     t1         fear      1     diet
10:  5 -0.06093002     t1         fear      1     diet
11:  1 -0.70725760     t0         know      1     diet
12:  2  1.06309003     t0         know      1     diet
13:  3  0.89501164     t0         know      1     diet
14:  4  1.48148316     t0         know      1     diet
15:  5  0.22086835     t0         know      1     diet

As an alternative to data.table, you can also split the variable column with the cSplit function of the splitstackshape package (you will have to rename the resulting variable columns afterwards though):

library(splitstackshape)
longData <- cSplit(longData, sep="_", "variable", "wide", type.convert=TRUE)
names(longData) <- c("id","value","moment","intervention","number","behavior")

or with tidyr:

library(tidyr)
separate(longData, variable, c("moment", "intervention", "number", "behavior"), sep="_", remove=TRUE)

143

answered Sep 17 '22 12:09

Jaap

Related questions
                            
                                Summarize (count/freq) by treatment type where individuals could receive both treatments
                            
                                R: The system cannot find the file specified?
                            
                                Remove non printable white spaces from unknown (to me) encoding
                            
                                Obtain date column from xts object [duplicate]
                            
                                Less smoothed line in ggplot2, alternatives to geom_smooth? [duplicate]
                            
                                data.table operation with .SD: calculating percentage change concisely
                            
                                Dynamic ylim in ggplot2 using dplyr pipe
                            
                                Make a column with duplicated values unique in a dataframe
                            
                                R sort summarise ddply by group sum
                            
                                When is Lexical Scope for a function within a function determined?
                            
                                How can I apply a gradient fill to a geom_rect object in ggplot2?
                            
                                Looping through date in R loses format
                            
                                cSplit library(splitstackshape) is always dropping the column
                            
                                Looping over rows in a dataframe
                            
                                R2 values - dplyr and broom
                            
                                Finding gaps between intervals using data.table
                            
                                r ngram extraction with regex
                            
                                R plot - how to add notes and description to the plot?
                            
                                In Shiny apps for R, how do I delay the firing of a reactive?
                            
                                Finding index of cummax inside a dplyr mutate?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I convert a wide dataframe to a long dataframe for a multilevel structure with 'quadruple nesting'?

Tags:

dataframe

r

reshape

reshape2

Matherion

People also ask

1 Answers

Jaap

Recent Activity

Donate For Us