I have a dataframe with duplicate column names in R, when I select specific columns from this dataframe using subset it renames the duplicates making them distinct. When I am creating a dataframe using the function data.frame() I can stop this happening by using the argument check.names = FALSE, is there a way I can also do this using subset (or any other way which selects names columns).
For example say I have the dataframe
data <- data.frame('sample' = 50, 'x_mean' = 1.5, 'Lower CI' = 1.0, 'Upper CI' = 2.0, 'sample' = 50, 'y_mean' = 0.6, 'Lower CI' = 0.3, 'Upper CI' = 0.9, check.names = FALSE)
selectVec <- c(TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE)
Using the code
subset(data, select = selectVec)
renames the duplicate confidence intervals 'Lower CI.1' and 'Upper CI.1', whereas I want to keep these as 'Lower CI' and 'Upper CI'. Does anyone know a way of doing this?
Thanks in advance.
It looks like you will get the same behavior with [. The only way I can think of is to reassign the names afterwards:
subdata <- data[, selectVec, drop = FALSE]
names(subdata) <- names(data)[selectVec]
However, be aware that having duplicated column names is a very unnatural, complicated (obviously) and risky format for keeping your data. I would try to understand why the file or data.frame had duplicated columns in the first place and fix it there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With