Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

delete duplicated column dplyr

Tags:

r

dplyr

This morning while doing some analysis with a data frame I got an error due to the presence of duplicated column names. I tried to find a solution using exclusively dplyr but I could not find anything that works. Here is an example to illustrate the problem. A dataframe with a duplicated column name.

x <- data.frame(matrix(c(1, 2, 3),
                c(2,2,1),nrow=2,ncol=3))
colnames(x) <- c("a", "a", "b")

When I try to drop the first column using the select command I get an error

x %>%
  select(-1)%>%filter(b>1)

Error: found duplicated column name: a

I can get rid of the column easily using traditional indexing and the using dplyr to filter by value

x<-x[,-1]%>%filter(b>1)

Which produces the desired output

> x
  a b
1 2 3
2 2 3

Any ideas on how to perform this using only dplyr grammar?

like image 985
asado23 Avatar asked Aug 03 '16 16:08

asado23


1 Answers

This could work, taking advantage of make.names behaviour. Don't know if I've cheated here, but it seems mostly to take advantage of dplyr functions.

x %>% 
    setNames(make.names(names(.), unique = TRUE)) %>% 
    select(-matches("*\\.[1-9]+$"))
like image 190
Chrisss Avatar answered Sep 28 '22 01:09

Chrisss