Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Consolidate duplicate columns and concatenate column names

I'm sure this must be a duplicate. Say I have this:

stage <- seq(1,3,1)
exp1 <- c("A","B","C")
exp2 <- c("A","B","C")
exp3 <- c(NA, "B","C")
exp4 <- c("D","B","C")
exp5 <- c("A","B","C")
exp6 <- c(NA, "B","C")

df <- data.frame(stage, exp1, exp2, exp3, exp4, exp5, exp6)

  stage exp1 exp2 exp3 exp4 exp5 exp6
1     1    A    A <NA>    D    A <NA>
2     2    B    B    B    B    B    B
3     3    C    C    C    C    C    C

I want to combine all duplicated columns and show this by concatenating the column names. I can find duplicated columns with:

df[duplicated(lapply(df, summary))]

exp2 exp5 exp6
1    A    A <NA>
2    B    B    B
3    C    C    C

But I can't figure out how to combine the duplicated column names such that I get something like this:

  stage exp1_exp2_exp5 exp3_exp6 exp4
1     1              A      <NA>    D
2     2              B         B    B
3     3              C         C    C

Perhaps (probably) this is a poor way to visualize a comparison between different "exp"s? Maybe I need to reshape to show this more clearly?

like image 280
Pete900 Avatar asked Jan 01 '26 15:01

Pete900


2 Answers

With base R, you can use match on the unique set of vectors in the data.frame to get the groupings, feed this to split, with the names of the variables as the first argument to break up the names by group, use sapplyand paste to concatenate the variable names in each group, and then use setNames to provide the names to the unique set of columns.

setNames(as.data.frame(unique(as.list(df))),
         sapply(split(names(df), match(as.list(df), unique(as.list(df)))),
                paste, collapse="-"))

which returns

  stage exp1-exp2-exp5 exp3-exp6 exp4
1     1              A      <NA>    D
2     2              B         B    B
3     3              C         C    C
like image 86
lmo Avatar answered Jan 03 '26 05:01

lmo


Looking for something like this?

library(dplyr)
library(tidyr)
df %>% 
  gather(variable, value, -1) %>% 
  group_by(variable) %>% 
  summarise(values = paste(sort(value), collapse = ',')) %>% 
  group_by(values) %>% 
  summarise(cols = paste(variable, collapse = '_')) %>% 
  separate_rows(values) %>% 
  left_join(df %>% gather(variable, value, -1, na.rm = TRUE), ., by = c('value'='values')) %>% 
  select(-variable) %>% 
  distinct() %>% 
  spread(cols, value)

which gives:

  stage exp1_exp2_exp5 exp3_exp6 exp4
1     1              A      <NA>    D
2     2              B         B    B
3     3              C         C    C
like image 37
Jaap Avatar answered Jan 03 '26 06:01

Jaap



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!