tidyr::gather multiple columns of varying types

Question

My question is similar to this question. I'm trying to tidyr::gather multiple columns. However, the solution provided in the link is less than ideal because the attributes are generally not identical across all columns and so they are dropped.

Note, I know how to do this with base R, but I'm trying to learn how to do the equivalent operation with tidyr and/or dplyr.

Below I've simulated some data (poorly, but quickly) that illustrate the situation I often find myself in (although I generally have far more columns that follow this same sort of a pattern). I've provided the base solution with stats::reshape so you can see my desired output.

Any help would be much appreciated.

set.seed(123)
male_g6 <- rbinom(100, 1, .5)
ell_g6 <- rbinom(100, 1, .1)
sped_g6 <- rbinom(100, 1, .15)
pullouts_g6 <- rbinom(100, 5, .1)
disability_g6 <- replicate(100, 
                sample(
                    c("asd", "cd", "ed", "hi", "id", "ohi", "ld", "none"),
                    1,
                    prob = c(rep(0.01, 6), 0.05, 0.89)
                    )
                 )
score_g6 <- rnorm(100, 200, 10)
score_g7 <- score_g6 + 5 + rnorm(100, 0, 2)
score_g8 <- score_g7 + 5 + rnorm(100, 0, 2)

d <- data.frame(
        SID = 1:100,
        male_g6 = male_g6,
        male_g7 = male_g6,
        male_g8 = male_g6,
        ell_g6 = ell_g6,
        ell_g7 = ell_g6,
        ell_g8 = ell_g6,
        sped_g6 = sped_g6,
        sped_g7 = sped_g6,
        sped_g8 = sped_g6,
        pullouts_g6 = pullouts_g6,
        pullouts_g7 = pullouts_g6,
        pullouts_g8 = pullouts_g6,
        disability_g6 = disability_g6,
        disability_g7 = disability_g6,
        disability_g8 = disability_g6,
        score_g6 = score_g6,
        score_g7 = score_g7,
        score_g8 = score_g8
    )

With base reshape

ld <- stats::reshape(d,
        idvar = "SID",
        varying = list(
            c("male_g6", "male_g7", "male_g8"),
            c("ell_g6", "ell_g7", "ell_g8"),
            c("sped_g6", "sped_g7", "sped_g8"),
            c("pullouts_g6", "pullouts_g7", "pullouts_g8"),
            c("disability_g6", "disability_g7", "disability_g8"),
            c("score_g6", "score_g7", "score_g8")
            ),
        v.names = c("male", "ell", "sped", "pullouts", "disability", "score"),
        times = 6:8,
        timevar = "Grade",
        direction = "long"
    )
ld <- ld[order(ld$SID), ]

alistaire · Accepted Answer

You'll need to gather beyond what you want to end with so you can separate the grade level from the headers, after which you can spread back to wide form:

ld2 <- d %>% gather(var, val, -SID) %>%     # gather to long form
    # separate grade from variable names
    separate(var, c('var', 'grade'), sep = '_g', convert = TRUE) %>% 
    spread(var, val, convert = TRUE)    # spread back to wide

head(ld2)

##   SID grade disability ell male pullouts    score sped
## 1   1     6         cd   0    0        1 196.2440    0
## 2   1     7         cd   0    0        1 203.2739    0
## 3   1     8         cd   0    0        1 211.1347    0
## 4   2     6       none   0    1        0 194.3812    1
## 5   2     7       none   0    1        0 195.3957    1
## 6   2     8       none   0    1        0 202.4890    1

tidyr::gather multiple columns of varying types

Tags:

r

dplyr

tidyr

tidyverse

With base reshape

Daniel Anderson

1 Answers

alistaire

Recent Activity

Donate For Us

tidyr::gather multiple columns of varying types

Tags:

r

dplyr

tidyr

tidyverse

With base reshape

Daniel Anderson

1 Answers

alistaire

Related questions

Recent Activity

Donate For Us