I created small data.table DT = data.table(a=1:2, a=1:2)
.
If I use names(DT) <- c("b","b")
I get a warning
In `names<-.data.table`(`*tmp*`, value = c("b", "b")) : The names(x)<-value syntax copies the whole table. This is due to <- in R itself. Please change to setnames(x,old,new) which does not copy and is faster. See help('setnames'). You can safely ignore this warning if it is inconvenient to change right now. Setting options(warn=2) turns this warning into an error, so you can then use traceback() to find and change your names<- calls.
But if i use setnames(DT, names(DT), c("b","b")
, then I get error
Error in setnames(DT, names(DT), c("b", "b")) : Some duplicates exist in 'old': a
If the same example do with data.frame than DT = data.frame(a=1:2, a=1:2)
and use names(DT) <- c("b","b")
then I get no error.
Method 1: using colnames() method colnames() method in R is used to rename and replace the column names of the data frame in R. The columns of the data frame can be renamed by specifying the new column names as a vector. The new name replaces the corresponding old name of the column in the data frame.
Naming Rows and Columns of a Matrix in R Programming – rownames() and colnames() Function. rownames() function in R Language is used to set the names to rows of a matrix.
To find the column names and row names in an R data frame based on a condition, we can use row. names and colnames function. The condition for which we want to find the row names and column names can be defined inside these functions as shown in the below Examples.
Don't provide old
and new
and you won't have a problem. However, that's not the issue. In base::data.frame
you can't have columns of the same name so...
# What you actually get... DT = data.frame(a=1:2, a=1:2); names(DT) #[1] "a" "a.1"
But it seems that in data.table
you can have columns of the same name...
DT = data.table(a=1:2, a=1:2); names(DT) [1] "a" "a"
But setnames
throws an error, I guess because it doesn't know which column a
refers to when both columns are called a
. You get no error when going the data.frame
to data.table
route because you do not have duplicated column names.
Firstly I'd say don't make columns with the same name, this is a really bad thing if you plan to use your data.table
programmatically (but as @MatthewDowle points out in the comments, this is a design choice to give the user maximum freedom in data.table
).
If you need to do it then use setnames
with just the old
argument given, which will actually be treated as the new
names when new
is not given. If you pass in old
names and a vector of new names the old names are found and those changed to the corresponding new name (so old
and new
have to be the same length when setnames
is used with 3 parameters). setnames
will catch any ambiguities via:
if (any(duplicated(old))) stop("Some duplicates exist in 'old': ", paste(old[duplicated(old)], collapse = ",")) if (any(duplicated(names(x)))) stop("'old' is character but there are duplicate column names: ", paste(names(x)[duplicated(names(x))], collapse = ","))
When just old
is supplied setnames
will reassign the names from old
to the columns of DT
column-wise using .Call(Csetcharvec, names(x), seq_along(names(x)), old)
, so from first to last...
DT = data.table(a=1:2, a=1:2) setnames(DT, c("b","b") ) DT # b b #1: 1 1 #2: 2 2
Addition from Matthew as requested. In ?setnames
there's some background :
It isn't good programming practice, in general, to use column numbers rather than names. This is why setkey and setkeyv only accept column names, and why old in setnames() is recommended to be names. If you use column numbers then bugs (possibly silent) can more easily creep into your code as time progresses if changes are made elsewhere in your code; e.g., if you add, remove or reorder columns in a few months time, a setkey by column number will then refer to a different column, possibly returning incorrect results with no warning. (A similar concept exists in SQL, where "select * from ..." is considered poor programming style when a robust, maintainable system is required.) If you really wish to use column numbers, it's possible but deliberately a little harder; e.g., setkeyv(DT,colnames(DT)[1:2]).
[As of July 2017, the note above no longer appears in ?setnames
, but the issue is discussed near the top of the FAQ, vignette('datatable-faq')
.]
So the idea of setnames
is to change one column name really easily, by name.
setnames(DT, "oldname", "newname")
If "oldname"
is not a column name or there's any ambiguity over what you intend (either in the data now or in a few months time after your colleagues have changed the source database or other code upstream or have passed their own data to your module) then data.table
will catch it for you. That's actually quite hard to do in base as easily and as well as setnames
does it (including the safety checks).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With