So I have loaded an Excel file which contains duplicate column names. I would like to add a suffix each time a column name is repeated. So:
problem_df <- data.frame(A = rep(1, 5), B = rep(2, 5), A = rep(3, 5), B = rep(4, 5), A = rep(5, 5))
solution_df <- data.frame(A = rep(1, 5), B = rep(2, 5), A_1 = rep(3, 5), B_1 = rep(4, 5), A_2 = rep(5, 5))
Or the column name suffixes can be '_2' and '_3'.
We can do with make.unique
which also have the sep
argument
make.unique(c("A", "B", "A", "B", "A"), sep="_")
#[1] "A" "B" "A_1" "B_1" "A_2"
In our 'problem_df', the data.frame
call is using the check.names = TRUE
, which call the make.names
that calls the make.unique
and by default the sep
is .
.
On checking the data.frame
, it is in the code block that starts from line 124
if (check.names) {
if (fix.empty.names)
vnames <- make.names(vnames, unique = TRUE) ###
else {
nz <- nzchar(vnames)
vnames[nz] <- make.names(vnames[nz], unique = TRUE) ###
}
}
names(value) <- vnames
One option is to use check.names = FALSE
and then assign the column names with make.unique
and sep="_"
problem_df <- data.frame(A = rep(1, 5), B = rep(2, 5), A = rep(3, 5),
B = rep(4, 5), A = rep(5, 5), check.names = FALSE)
names(problem_df) <- make.unique(names(problem_df), sep="_")
Or using sub
assuming that the dataset object is created with the .\\d+
as column names for duplicate names
sub("\\.", "_", names(problem_df))
#[1] "A" "B" "A_1" "B_1" "A_2"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With