Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr "Select" - Error: found duplicated column name

Tags:

I am trying to extract columns from a DT to a new DT using select{dplyr}

extract_Data <- select(.data = master_merge, subjectID, activity_ID,
                           contains("mean\\(\\)"), contains("std\\(\\)"))

There are 563 columns so I am asking to extract the first and second column (subject, activity) and all other columns where mean() or std() is present.

There are NO duplicate columns that can be created here. so stumped as to the why. I have tried every variation of select but always Error: Duplicated Column name.

How can I troubleshoot this - I have gone through all 563 columns names and there are no duplicates.

like image 224
scopa Avatar asked Feb 16 '15 19:02

scopa


1 Answers

The root of the problem is invalid characters in the original column names. The discussion in Variable Name Restrictions in R applies to column names, too. Try forcing unique column names with valid characters, with make.names() .

valid_column_names <- make.names(names=names(master_merge), unique=TRUE, allow_ = TRUE)
names(master_merge) <- valid_column_names
like image 185
Lantana Avatar answered Sep 23 '22 09:09

Lantana