Keep duplicate column names when subsetting in R

Question

I have a dataframe with duplicate column names in R, when I select specific columns from this dataframe using subset it renames the duplicates making them distinct. When I am creating a dataframe using the function data.frame() I can stop this happening by using the argument check.names = FALSE, is there a way I can also do this using subset (or any other way which selects names columns).

For example say I have the dataframe

data <- data.frame('sample' = 50, 'x_mean' = 1.5, 'Lower CI' = 1.0, 'Upper CI' = 2.0, 'sample' = 50, 'y_mean' = 0.6, 'Lower CI' = 0.3, 'Upper CI' = 0.9, check.names = FALSE)

selectVec <- c(TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE)

Using the code

subset(data, select = selectVec)

renames the duplicate confidence intervals 'Lower CI.1' and 'Upper CI.1', whereas I want to keep these as 'Lower CI' and 'Upper CI'. Does anyone know a way of doing this?

Thanks in advance.

flodel · Accepted Answer

It looks like you will get the same behavior with [. The only way I can think of is to reassign the names afterwards:

subdata <- data[, selectVec, drop = FALSE]
names(subdata) <- names(data)[selectVec]

However, be aware that having duplicated column names is a very unnatural, complicated (obviously) and risky format for keeping your data. I would try to understand why the file or data.frame had duplicated columns in the first place and fix it there.

Keep duplicate column names when subsetting in R

Tags:

r

duplicates

subset

user1165199

1 Answers

flodel

Recent Activity

Donate For Us

Keep duplicate column names when subsetting in R

Tags:

r

duplicates

subset

user1165199

1 Answers

flodel

Related questions

Recent Activity

Donate For Us