Uniquefy duplicate column names in R [duplicate]

Question

So I have loaded an Excel file which contains duplicate column names. I would like to add a suffix each time a column name is repeated. So:

problem_df <- data.frame(A = rep(1, 5), B = rep(2, 5), A = rep(3, 5), B = rep(4, 5), A = rep(5, 5))
solution_df <- data.frame(A = rep(1, 5), B = rep(2, 5), A_1 = rep(3, 5), B_1 = rep(4, 5), A_2 = rep(5, 5))

Or the column name suffixes can be '_2' and '_3'.

akrun · Accepted Answer

We can do with make.unique which also have the sep argument

make.unique(c("A", "B", "A", "B", "A"), sep="_")
#[1] "A"   "B"   "A_1" "B_1" "A_2"

In our 'problem_df', the data.frame call is using the check.names = TRUE, which call the make.names that calls the make.unique and by default the sep is ..

On checking the data.frame, it is in the code block that starts from line 124

  if (check.names) {
    if (fix.empty.names) 
        vnames <- make.names(vnames, unique = TRUE) ###
    else {
        nz <- nzchar(vnames)
        vnames[nz] <- make.names(vnames[nz], unique = TRUE) ###
    }
}
names(value) <- vnames

One option is to use check.names = FALSE and then assign the column names with make.unique and sep="_"

problem_df <- data.frame(A = rep(1, 5), B = rep(2, 5), A = rep(3, 5),
       B = rep(4, 5), A = rep(5, 5), check.names = FALSE)
names(problem_df) <- make.unique(names(problem_df), sep="_")

Or using sub assuming that the dataset object is created with the .\d+ as column names for duplicate names

sub("\.", "_", names(problem_df))
#[1] "A"   "B"   "A_1" "B_1" "A_2"

Uniquefy duplicate column names in R [duplicate]

Tags:

r

duplicates

unique

columnname

Michael

1 Answers

akrun

Recent Activity

Donate For Us

Uniquefy duplicate column names in R [duplicate]

Tags:

r

duplicates

unique

columnname

Michael

1 Answers

akrun

Related questions

Recent Activity

Donate For Us