Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Gather duplicate column sets into single columns

The problem of gathering multiple sets of columns was already addressed here: Gather multiple sets of columns, but in my case, the columns are not unique.

I have the following data:

input <- data.frame(
  id = 1:2, 
  question = c("a", "b"),
  points = 0,
  max_points = c(3, 5),
  question = c("c", "d"),
  points = c(0, 20),
  max_points = c(5, 20),
  check.names = F,
  stringsAsFactors = F
)
input
#>   id question points max_points question points max_points
#> 1  1        a      0          3        c      0          5
#> 2  2        b      0          5        d     20         20

The first column is an id, then I have many repeated columns (the original dataset has 133 columns):

  1. identifier for question
  2. points given
  3. maximum points

I would like to end up with this structure:

expected <- data.frame(
  id = c(1, 2, 1, 2),
  question = letters[1:4],
  points = c(0, 0, 0, 20),
  max_points = c(3, 5, 5, 20),
  stringsAsFactors = F
)
expected
#>   id question points max_points
#> 1  1        a      0          3
#> 2  2        b      0          5
#> 3  1        c      0          5
#> 4  2        d     20         20

I have tried several things:

  • tidyr::gather(input, key, val, -id)
  • reshape2::melt(input, id.vars = "id")

Both do not deliver the desired output. Furthermore, with more columns than shown here, gather doesn't work any more, because there are too many duplicate columns.

As a workaround I tried this:

# add numbers to make col headers "unique"
names(input) <- c("id", paste0(1:(length(names(input)) - 1), names(input)[-1]))

# gather, remove number, spread
input %>% 
  gather(key, val, -id) %>%
  mutate(key = stringr::str_replace_all(key, "[:digit:]", "")) %>%
  spread(key, val)

which gives an error: Duplicate identifiers for rows (3, 9), (4, 10), (1, 7), (2, 8)

This problem was already discussed here: Unexpected behavior with tidyr, but I don't know why/how I should add another identifier. Most likely this is not the main problem, because I probably should approach the whole thing differently.

How could I solve my problem, preferably with tidyr or base? I don't know how to use data.table, but in case there is a simple solution, I will settle for that too.

like image 450
Thomas K Avatar asked Mar 04 '26 07:03

Thomas K


1 Answers

Try this:

do.call(rbind,
        lapply(seq(2, ncol(input), 3), function(i){
          input[, c(1, i:(i + 2))]
              })
        )

#   id question points max_points
# 1  1        a      0          3
# 2  2        b      0          5
# 3  1        c      0          5
# 4  2        d     20         20
like image 74
zx8754 Avatar answered Mar 05 '26 19:03

zx8754



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!