Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace variable names with row values and vice versa

Tags:

r

dplyr

I have data in an unusual format. What should be the variable names / column headings are currently row values, and what should be the row values are the variable names / column headings.

That is, I have a dataframe like this:

id <- seq(1, 5, 1)
good <- c('', 'Q4', 'Q4, Q2', '', '')
ok <- c('Q3, Q1', '', '', 'Q2', '')
bad <- c('', 'Q2', 'Q2', '', '')

data <- as.data.frame(cbind(id, good, ok, bad))
cols <- c('good', 'ok', 'bad')
data[cols] <- lapply(data[cols], as.character)

And I would like to convert it into a dataframe that looks like this:

id <- seq(1,5,1)
Q1 <- c('ok', '', '', '', '')
Q2 <- c('', 'bad', 'good, bad', 'ok', '') # Yes, it is possible to get multiple, 
# conflicting responses to a question from one id.
Q3 <- c('ok', '', '', '', '')
Q4 <- c('', 'good', 'good', '', '')

data_new <- as.data.frame(cbind(id, Q1, Q2, Q3, Q4))
cols <- c('Q1', 'Q2', 'Q3', 'Q4')
data_new[cols] <- lapply(data_new[cols], as.character)

Some issues:

  • What are currently the row values can't just be transposed directlyto the column headings because sometimes multiple entries are recorded in a single cell and there should be only one value recorded in a column heading (e.g., I don't want a columnheading labelled 'Q2, Q4' in the updated dataframe).

  • On the other hand, when the column headings are moved to become row values, there will sometimes need to be multiple entries in the same cell. For example, the entry under the variable Q2 for id 3 should read 'good, bad' in the updated dataframe.

A dplyr solution is preferred but answers using other packages/base R are also welcomed.

like image 339
userLL Avatar asked Apr 07 '26 18:04

userLL


1 Answers

Here is one way with gather/spread. We gather the data from 'wide' to 'long', filter out the rows based on the blank elements in 'val', expand the data by splitting at the delimiter (,) in 'val', grouped by 'id', 'val', paste the elements of 'key' and spread it back to 'wide' format

library(tidyverse)
data %>%
   gather(key, val, -id) %>% 
   filter(val != "") %>% 
   separate_rows(val) %>% 
   group_by(id = factor(id, levels = 1:5), val) %>% 
   summarise(key = toString(key)) %>%
   spread(val, key, fill = "", drop = FALSE)
# A tibble: 5 x 5
# Groups:   id [5]
#  id    Q1    Q2        Q3    Q4   
#  <fct> <chr> <chr>     <chr> <chr>
#1 1     ok    ""        ok    ""   
#2 2     ""    bad       ""    good 
#3 3     ""    good, bad ""    good 
#4 4     ""    ok        ""    ""   
#5 5     ""    ""        ""    ""   

NOTE: The format is based on the print format of tibble. If we want to change to data.frame, use as.data.frame at the end

like image 187
akrun Avatar answered Apr 10 '26 10:04

akrun