Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

setnames for duplicate colnames in data.table

Tags:

r

data.table

For some reasons (doesn't matter), the colnames in the excel imported dataframe have duplicates as given below (DT is data table converted from dataframe DF). However, these are unique colnames and hence the need for use of setnames.

    DF<-structure(list(X1 = c("", "15 May 2014", "16 May 2014", "18 May 2014", 
"19 May 2014"), X2 = c(NaN, 746.18, 746.18, 744.34, 739.95), 
X3 = c(NaN, 549.9, 549.9, 546.5, 549.65), X1 = c(NaN, 406.57, 
406.57, 406.66, 404.73), X1 = c(NaN, 1788.86, 1788.86, 1767.69, 
1772.34), X1 = c(NaN, 2286, 2286, 2302.37, 2313.14), X2 = c(NaN, 
3639.25, 3639.25, 3622.08, 3569.53), X3 = c(NaN, 1160.13, 
1160.13, 1144.77, 1129.72), X1 = c(NaN, 182.83, 182.83, 182.83, 
182.83), X2 = c(NaN, 787.13, 787.13, 775.39, 764.82), X1 = c(NaN, 
853.2, 853.2, 849.67, 844.49)), .Names = c("X1", "X2", "X3", 
"X1", "X1", "X1", "X2", "X3", "X1", "X2", "X1"), class = c("data.table", 
"data.frame"), row.names = c(NA, -5L))

DT<-as.data.table(DF)

 >DT

                X1     X2     X3     X1      X1      X1      X2      X3     X1     X2     X1
    1:                NaN    NaN    NaN     NaN     NaN     NaN     NaN    NaN    NaN    NaN
    2: 15 May 2014 746.18 549.90 406.57 1788.86 2286.00 3639.25 1160.13 182.83 787.13 853.20
    3: 16 May 2014 746.18 549.90 406.57 1788.86 2286.00 3639.25 1160.13 182.83 787.13 853.20
    4: 18 May 2014 744.34 546.50 406.66 1767.69 2302.37 3622.08 1144.77 182.83 775.39 849.67
    5: 19 May 2014 739.95 549.65 404.73 1772.34 2313.14 3569.53 1129.72 182.83 764.82 844.49

So, I decided to change these colnames using setnames, but I get the following error (which is obvious):

 new_names<-c("Date","BOD","DO","FI","HT","HY","IN","MA","SE","OR","RA")
 old_names<-names(DT)
 setnames(DT, old_names, new_names)

 Error in setnames(DT, old_names, new_names) : 
  Some duplicates exist in 'old': X1,X1,X1,X2,X3,X1,X2,X1

So, I resorted to data.frame way of changing colnames

names(DT)<-new_names # this doesn't give any error but still gives warnings

Warning message:
In `names<-.data.table`(`*tmp*`, value = c("Date", "BOD", "DO",  :
  The names(x)<-value syntax copies the whole table. This is due to <- in R itself. Please change to setnames(x,old,new) which does not copy and is faster. See help('setnames'). You can safely ignore this warning if it is inconvenient to change right now. Setting options(warn=2) turns this warning into an error, so you can then use traceback() to find and change your names<- calls.
> DT
          Date    BOD     DO     FI      HT      HY      IN      MA     SE     OR     RA
1:                NaN    NaN    NaN     NaN     NaN     NaN     NaN    NaN    NaN    NaN
2: 15 May 2014 746.18 549.90 406.57 1788.86 2286.00 3639.25 1160.13 182.83 787.13 853.20
3: 16 May 2014 746.18 549.90 406.57 1788.86 2286.00 3639.25 1160.13 182.83 787.13 853.20
4: 18 May 2014 744.34 546.50 406.66 1767.69 2302.37 3622.08 1144.77 182.83 775.39 849.67
5: 19 May 2014 739.95 549.65 404.73 1772.34 2313.14 3569.53 1129.72 182.83 764.82 844.49 

So, I was wondering whether there is a data.table (unique) way of changing colnames when colnames are not unique (again, this is because the data was imported from the excel).

like image 414
user227710 Avatar asked Mar 20 '15 18:03

user227710


2 Answers

You can omit the old_names:

setnames(DT, new_names)

Will work assuming new_names has all the names in the correct order. From ?setnames:

setnames(x,old,new):
old: When new is provided, character names or numeric positions of column names to change. When new is not provided, the new column names, which must be the same length as the number of columns. See examples.

like image 149
BrodieG Avatar answered Sep 22 '22 17:09

BrodieG


I came across the same issue but I didn't care about what new names the columns can get, so I just simply needed unique names. Combining make.unique or make.names (as suggested here) with setnames (as pointed by @BrodieG above) solved my issue:

# considering your DT object:
setnames(DT, make.unique(names(DT)))
# The new column names are:
names(DT)
## [1] "X1"   "X2"   "X3"   "X1.1" "X1.2" "X1.3" "X2.1" "X3.1" "X1.4" "X2.2" "X1.5"
# Same can be achieved with:
setnames(DT, make.names(names(DT), unique = TRUE))
like image 33
Valentin Avatar answered Sep 21 '22 17:09

Valentin