Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keep duplicate column names when subsetting in R

I have a dataframe with duplicate column names in R, when I select specific columns from this dataframe using subset it renames the duplicates making them distinct. When I am creating a dataframe using the function data.frame() I can stop this happening by using the argument check.names = FALSE, is there a way I can also do this using subset (or any other way which selects names columns).

For example say I have the dataframe

data <- data.frame('sample' = 50, 'x_mean' = 1.5, 'Lower CI' = 1.0, 'Upper CI' = 2.0, 'sample' = 50, 'y_mean' = 0.6, 'Lower CI' = 0.3, 'Upper CI' = 0.9, check.names = FALSE)

selectVec <- c(TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE)

Using the code

subset(data, select = selectVec)

renames the duplicate confidence intervals 'Lower CI.1' and 'Upper CI.1', whereas I want to keep these as 'Lower CI' and 'Upper CI'. Does anyone know a way of doing this?

Thanks in advance.

like image 211
user1165199 Avatar asked Mar 05 '26 21:03

user1165199


1 Answers

It looks like you will get the same behavior with [. The only way I can think of is to reassign the names afterwards:

subdata <- data[, selectVec, drop = FALSE]
names(subdata) <- names(data)[selectVec]

However, be aware that having duplicated column names is a very unnatural, complicated (obviously) and risky format for keeping your data. I would try to understand why the file or data.frame had duplicated columns in the first place and fix it there.

like image 100
flodel Avatar answered Mar 08 '26 09:03

flodel