Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R data.table change R names

Tags:

r

data.table

I created small data.table DT = data.table(a=1:2, a=1:2).

If I use names(DT) <- c("b","b")

I get a warning

In `names<-.data.table`(`*tmp*`, value = c("b", "b")) :   The names(x)<-value syntax copies the whole table. This is due to <- in R itself. Please change to setnames(x,old,new) which does not copy and is faster. See help('setnames'). You can safely ignore this warning if it is inconvenient to change right now. Setting options(warn=2) turns this warning into an error, so you can then use traceback() to find and change your names<- calls. 

But if i use setnames(DT, names(DT), c("b","b"), then I get error

Error in setnames(DT, names(DT), c("b", "b")) :    Some duplicates exist in 'old': a 

If the same example do with data.frame than DT = data.frame(a=1:2, a=1:2) and use names(DT) <- c("b","b") then I get no error.

like image 525
user2771940 Avatar asked Sep 12 '13 09:09

user2771940


People also ask

How do I change the column names in a data table in R?

Method 1: using colnames() method colnames() method in R is used to rename and replace the column names of the data frame in R. The columns of the data frame can be renamed by specifying the new column names as a vector. The new name replaces the corresponding old name of the column in the data frame.

How do you name a column and row in R?

Naming Rows and Columns of a Matrix in R Programming – rownames() and colnames() Function. rownames() function in R Language is used to set the names to rows of a matrix.

How do I get column names in R?

To find the column names and row names in an R data frame based on a condition, we can use row. names and colnames function. The condition for which we want to find the row names and column names can be defined inside these functions as shown in the below Examples.


1 Answers

Don't provide old and new and you won't have a problem. However, that's not the issue. In base::data.frame you can't have columns of the same name so...

#  What you actually get... DT = data.frame(a=1:2, a=1:2); names(DT) #[1] "a"   "a.1" 

But it seems that in data.table you can have columns of the same name...

DT = data.table(a=1:2, a=1:2); names(DT) [1] "a" "a" 

But setnames throws an error, I guess because it doesn't know which column a refers to when both columns are called a. You get no error when going the data.frame to data.table route because you do not have duplicated column names.

Firstly I'd say don't make columns with the same name, this is a really bad thing if you plan to use your data.table programmatically (but as @MatthewDowle points out in the comments, this is a design choice to give the user maximum freedom in data.table).

If you need to do it then use setnames with just the old argument given, which will actually be treated as the new names when new is not given. If you pass in old names and a vector of new names the old names are found and those changed to the corresponding new name (so old and new have to be the same length when setnames is used with 3 parameters). setnames will catch any ambiguities via:

if (any(duplicated(old)))             stop("Some duplicates exist in 'old': ", paste(old[duplicated(old)],                 collapse = ",")) if (any(duplicated(names(x))))             stop("'old' is character but there are duplicate column names: ",                  paste(names(x)[duplicated(names(x))], collapse = ","))  

When just old is supplied setnames will reassign the names from old to the columns of DT column-wise using .Call(Csetcharvec, names(x), seq_along(names(x)), old), so from first to last...

DT = data.table(a=1:2, a=1:2) setnames(DT, c("b","b") ) DT #   b b #1: 1 1 #2: 2 2 

Addition from Matthew as requested. In ?setnames there's some background :

It isn't good programming practice, in general, to use column numbers rather than names. This is why setkey and setkeyv only accept column names, and why old in setnames() is recommended to be names. If you use column numbers then bugs (possibly silent) can more easily creep into your code as time progresses if changes are made elsewhere in your code; e.g., if you add, remove or reorder columns in a few months time, a setkey by column number will then refer to a different column, possibly returning incorrect results with no warning. (A similar concept exists in SQL, where "select * from ..." is considered poor programming style when a robust, maintainable system is required.) If you really wish to use column numbers, it's possible but deliberately a little harder; e.g., setkeyv(DT,colnames(DT)[1:2]).

[As of July 2017, the note above no longer appears in ?setnames, but the issue is discussed near the top of the FAQ, vignette('datatable-faq').]

So the idea of setnames is to change one column name really easily, by name.

setnames(DT, "oldname", "newname") 

If "oldname" is not a column name or there's any ambiguity over what you intend (either in the data now or in a few months time after your colleagues have changed the source database or other code upstream or have passed their own data to your module) then data.table will catch it for you. That's actually quite hard to do in base as easily and as well as setnames does it (including the safety checks).

like image 150
Simon O'Hanlon Avatar answered Sep 22 '22 08:09

Simon O'Hanlon