Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Uniquefy duplicate column names in R [duplicate]

So I have loaded an Excel file which contains duplicate column names. I would like to add a suffix each time a column name is repeated. So:

problem_df <- data.frame(A = rep(1, 5), B = rep(2, 5), A = rep(3, 5), B = rep(4, 5), A = rep(5, 5))
solution_df <- data.frame(A = rep(1, 5), B = rep(2, 5), A_1 = rep(3, 5), B_1 = rep(4, 5), A_2 = rep(5, 5))

Or the column name suffixes can be '_2' and '_3'.

like image 971
Michael Avatar asked Dec 01 '17 14:12

Michael


1 Answers

We can do with make.unique which also have the sep argument

make.unique(c("A", "B", "A", "B", "A"), sep="_")
#[1] "A"   "B"   "A_1" "B_1" "A_2"

In our 'problem_df', the data.frame call is using the check.names = TRUE, which call the make.names that calls the make.unique and by default the sep is ..

On checking the data.frame, it is in the code block that starts from line 124

  if (check.names) {
    if (fix.empty.names) 
        vnames <- make.names(vnames, unique = TRUE) ###
    else {
        nz <- nzchar(vnames)
        vnames[nz] <- make.names(vnames[nz], unique = TRUE) ###
    }
}
names(value) <- vnames  

One option is to use check.names = FALSE and then assign the column names with make.unique and sep="_"

problem_df <- data.frame(A = rep(1, 5), B = rep(2, 5), A = rep(3, 5),
       B = rep(4, 5), A = rep(5, 5), check.names = FALSE)
names(problem_df) <- make.unique(names(problem_df), sep="_")

Or using sub assuming that the dataset object is created with the .\\d+ as column names for duplicate names

sub("\\.", "_", names(problem_df))
#[1] "A"   "B"   "A_1" "B_1" "A_2"
like image 78
akrun Avatar answered Oct 24 '22 03:10

akrun