This morning while doing some analysis with a data frame I got an error due to the presence of duplicated column names. I tried to find a solution using exclusively dplyr but I could not find anything that works. Here is an example to illustrate the problem. A dataframe with a duplicated column name.
x <- data.frame(matrix(c(1, 2, 3),
c(2,2,1),nrow=2,ncol=3))
colnames(x) <- c("a", "a", "b")
When I try to drop the first column using the select command I get an error
x %>%
select(-1)%>%filter(b>1)
Error: found duplicated column name: a
I can get rid of the column easily using traditional indexing and the using dplyr to filter by value
x<-x[,-1]%>%filter(b>1)
Which produces the desired output
> x
a b
1 2 3
2 2 3
Any ideas on how to perform this using only dplyr grammar?
This could work, taking advantage of make.names
behaviour. Don't know if I've cheated here, but it seems mostly to take advantage of dplyr functions.
x %>%
setNames(make.names(names(.), unique = TRUE)) %>%
select(-matches("*\\.[1-9]+$"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With