Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rename multiple dataframe columns, referenced by current names

I want to rename some random columns of a large data frame and I want to use the current column names, not the indexes. Column indexes might change if I add or remove columns to the data, so I figure using the existing column names is a more stable solution. This is what I have now:

mydf = merge(df.1, df.2) colnames(mydf)[which(colnames(mydf) == "MyName.1")] = "MyNewName" 

Can I simplify this code, either the original merge() call or just the second line? "MyName.1" is actually the result of an xts merge of two different xts objects.

like image 680
Robert Kubrick Avatar asked Feb 14 '12 19:02

Robert Kubrick


People also ask

Which function of DataFrame is used to rename the existing column names?

Using rename() function Pandas has a built-in function called rename() to change the column names.


2 Answers

The trouble with changing column names of a data.frame is that, almost unbelievably, the entire data.frame is copied. Even when it's in .GlobalEnv and no other variable points to it.

The data.table package has a setnames() function which changes column names by reference without copying the whole dataset. data.table is different in that it doesn't copy-on-write, which can be very important for large datasets. (You did say your data set was large.). Simply provide the old and the new names:

require(data.table) setnames(DT,"MyName.1", "MyNewName") # or more explicit: setnames(DT, old = "MyName.1", new = "MyNewName") ?setnames 
like image 68
Matt Dowle Avatar answered Sep 18 '22 23:09

Matt Dowle


names(mydf)[names(mydf) == "MyName.1"] = "MyNewName" # 13 characters shorter.  

Although, you may want to replace a vector eventually. In that case, use %in% instead of == and set MyName.1 as a vector of equal length to MyNewName

like image 39
Brandon Bertelsen Avatar answered Sep 21 '22 23:09

Brandon Bertelsen