Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove duplicated column names in R?

Tags:

r

I have very big matrix, I know that some of the colnames of them are duplicated. so I just want to find those duplicated colnames and remove on of the column from duplicate. I tried duplicate(), but it removes the duplicate entries. Would someone help me to implment this in R ? the point is that, duplicate colnames, might not have duplicate entires.

like image 934
user2806363 Avatar asked Jun 10 '14 13:06

user2806363


People also ask

How do I eliminate duplicate columns in R?

The easiest way to remove repeated column names from a data frame is by using the duplicated() function. This function (together with the colnames() function) indicates for each column name if it appears more than once. Using this information and square brackets one can easily remove the duplicate column names.

How do I remove column names in R?

In R, the easiest way to remove columns from a data frame based on their name is by using the %in% operator. This operator lets you specify the redundant column names and, in combination with the names() function, removes them from the data frame. Alternatively, you can use the subset() function or the dplyr package.

How do I remove duplicate row names in R?

Remove Duplicate rows in R using Dplyr – distinct () function. Distinct function in R is used to remove duplicate rows in R using Dplyr package. Dplyr package in R is provided with distinct() function which eliminate duplicates rows with single variable or with multiple variable.


2 Answers

Let's say temp is your matrix

temp <- matrix(seq_len(15), 5, 3) colnames(temp) <- c("A", "A", "B")  ##      A  A  B ## [1,] 1  6 11 ## [2,] 2  7 12 ## [3,] 3  8 13 ## [4,] 4  9 14 ## [5,] 5 10 15 

You could do

temp <- temp[, !duplicated(colnames(temp))]  ##      A  B ## [1,] 1 11 ## [2,] 2 12 ## [3,] 3 13 ## [4,] 4 14 ## [5,] 5 15 

Or, if you want to keep the last duplicated column, you can do

temp <- temp[, !duplicated(colnames(temp), fromLast = TRUE)]   ##       A  B ## [1,]  6 11 ## [2,]  7 12 ## [3,]  8 13 ## [4,]  9 14 ## [5,] 10 15 
like image 189
David Arenburg Avatar answered Sep 20 '22 21:09

David Arenburg


Or assuming data.frames you could use subset:

subset(iris, select=which(!duplicated(names(.))))  

Note that dplyr::select is not applicable here because it requires column-uniqueness in the input data already.

like image 27
Holger Brandl Avatar answered Sep 20 '22 21:09

Holger Brandl