For some reasons (doesn't matter), the colnames in the excel imported dataframe have duplicates as given below (DT is data table converted from dataframe DF). However, these are unique colnames and hence the need for use of setnames
.
DF<-structure(list(X1 = c("", "15 May 2014", "16 May 2014", "18 May 2014",
"19 May 2014"), X2 = c(NaN, 746.18, 746.18, 744.34, 739.95),
X3 = c(NaN, 549.9, 549.9, 546.5, 549.65), X1 = c(NaN, 406.57,
406.57, 406.66, 404.73), X1 = c(NaN, 1788.86, 1788.86, 1767.69,
1772.34), X1 = c(NaN, 2286, 2286, 2302.37, 2313.14), X2 = c(NaN,
3639.25, 3639.25, 3622.08, 3569.53), X3 = c(NaN, 1160.13,
1160.13, 1144.77, 1129.72), X1 = c(NaN, 182.83, 182.83, 182.83,
182.83), X2 = c(NaN, 787.13, 787.13, 775.39, 764.82), X1 = c(NaN,
853.2, 853.2, 849.67, 844.49)), .Names = c("X1", "X2", "X3",
"X1", "X1", "X1", "X2", "X3", "X1", "X2", "X1"), class = c("data.table",
"data.frame"), row.names = c(NA, -5L))
DT<-as.data.table(DF)
>DT
X1 X2 X3 X1 X1 X1 X2 X3 X1 X2 X1
1: NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2: 15 May 2014 746.18 549.90 406.57 1788.86 2286.00 3639.25 1160.13 182.83 787.13 853.20
3: 16 May 2014 746.18 549.90 406.57 1788.86 2286.00 3639.25 1160.13 182.83 787.13 853.20
4: 18 May 2014 744.34 546.50 406.66 1767.69 2302.37 3622.08 1144.77 182.83 775.39 849.67
5: 19 May 2014 739.95 549.65 404.73 1772.34 2313.14 3569.53 1129.72 182.83 764.82 844.49
So, I decided to change these colnames using setnames
, but I get the following error (which is obvious):
new_names<-c("Date","BOD","DO","FI","HT","HY","IN","MA","SE","OR","RA")
old_names<-names(DT)
setnames(DT, old_names, new_names)
Error in setnames(DT, old_names, new_names) :
Some duplicates exist in 'old': X1,X1,X1,X2,X3,X1,X2,X1
So, I resorted to data.frame way of changing colnames
names(DT)<-new_names # this doesn't give any error but still gives warnings
Warning message:
In `names<-.data.table`(`*tmp*`, value = c("Date", "BOD", "DO", :
The names(x)<-value syntax copies the whole table. This is due to <- in R itself. Please change to setnames(x,old,new) which does not copy and is faster. See help('setnames'). You can safely ignore this warning if it is inconvenient to change right now. Setting options(warn=2) turns this warning into an error, so you can then use traceback() to find and change your names<- calls.
> DT
Date BOD DO FI HT HY IN MA SE OR RA
1: NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2: 15 May 2014 746.18 549.90 406.57 1788.86 2286.00 3639.25 1160.13 182.83 787.13 853.20
3: 16 May 2014 746.18 549.90 406.57 1788.86 2286.00 3639.25 1160.13 182.83 787.13 853.20
4: 18 May 2014 744.34 546.50 406.66 1767.69 2302.37 3622.08 1144.77 182.83 775.39 849.67
5: 19 May 2014 739.95 549.65 404.73 1772.34 2313.14 3569.53 1129.72 182.83 764.82 844.49
So, I was wondering whether there is a data.table (unique) way of changing colnames when colnames are not unique (again, this is because the data was imported from the excel).
You can omit the old_names:
setnames(DT, new_names)
Will work assuming new_names
has all the names in the correct order. From ?setnames
:
setnames(x,old,new)
:old
: When new is provided, character names or numeric positions of column names to change. When new is not provided, the new column names, which must be the same length as the number of columns. See examples.
I came across the same issue but I didn't care about what new names the columns can get, so I just simply needed unique names. Combining make.unique
or make.names
(as suggested here) with setnames
(as pointed by @BrodieG above) solved my issue:
# considering your DT object:
setnames(DT, make.unique(names(DT)))
# The new column names are:
names(DT)
## [1] "X1" "X2" "X3" "X1.1" "X1.2" "X1.3" "X2.1" "X3.1" "X1.4" "X2.2" "X1.5"
# Same can be achieved with:
setnames(DT, make.names(names(DT), unique = TRUE))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With