Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between `names(df[1]) <- ` and `names(df)[1] <- `

Tags:

Consider the following:

df <- data.frame(a = 1, b = 2, c = 3) names(df[1]) <- "d" ## First method ##  a b c ##1 1 2 3  names(df)[1] <- "d" ## Second method ##  d b c ##1 1 2 3 

Both methods didn't return an error, but the first didn't change the column name, while the second did.

I thought it has something to do with the fact that I'm operating only on a subset of df, but why, for example, the following works fine then?

df[1] <- 2  ##  a b c ##1 2 2 3 
like image 999
David Arenburg Avatar asked May 02 '14 12:05

David Arenburg


People also ask

What is the difference between Colnames () and names ()?

names() creates name attributes where as colnames() simply names the columns.

What does DF mean in R?

A data frame is used for storing data tables. It is a list of vectors of equal length. For example, the following variable df is a data frame containing three vectors n, s, b.

How do I give column names to a Dataframe in R?

Method 1: using colnames() method colnames() method in R is used to rename and replace the column names of the data frame in R. The columns of the data frame can be renamed by specifying the new column names as a vector. The new name replaces the corresponding old name of the column in the data frame.

How do I get a list of column names in R?

To find the column names and row names in an R data frame based on a condition, we can use row. names and colnames function. The condition for which we want to find the row names and column names can be defined inside these functions as shown in the below Examples.


1 Answers

What I think is happening is that replacement into a data frame ignores the attributes of the data frame that is drawn from. I am not 100% sure of this, but the following experiments appear to back it up:

df <- data.frame(a = 1:3, b = 5:7) #   a b # 1 1 5 # 2 2 6 # 3 3 7  df2 <- data.frame(c = 10:12) #    c # 1 10 # 2 11 # 3 12  df[1] <- df2[1]   # in this case `df[1] <- df2` is equivalent 

Which produces:

#    a b # 1 10 5 # 2 11 6 # 3 12 7 

Notice how the values changed for df, but not the names. Basically the replacement operator `[<-` only replaces the values. This is why the name was not updated. I believe this explains all the issues.

In the scenario:

names(df[2]) <- "x" 

You can think of the assignment as follows (this is a simplification, see end of post for more detail):

tmp <- df[2] #   b # 1 5 # 2 6 # 3 7  names(tmp) <- "x" #   x # 1 5 # 2 6 # 3 7  df[2] <- tmp   # `tmp` has "x" for names, but it is ignored! #    a b # 1 10 5 # 2 11 6 # 3 12 7 

The last step of which is an assignment with `[<-`, which doesn't respect the names attribute of the RHS.

But in the scenario:

names(df)[2] <- "x" 

you can think of the assignment as (again, a simplification):

tmp <- names(df) # [1] "a" "b"  tmp[2] <- "x" # [1] "a" "x"  names(df) <- tmp #    a x # 1 10 5 # 2 11 6 # 3 12 7 

Notice how we directly assign to names, instead of assigning to df which ignores attributes.

df[2] <- 2 

works because we are assigning directly to the values, not the attributes, so there are no problems here.


EDIT: based on some commentary from @AriB.Friedman, here is a more elaborate version of what I think is going on (note I'm omitting the S3 dispatch to `[.data.frame`, etc., for clarity):

Version 1 names(df[2]) <- "x" translates to:

df <- `[<-`(   df, 2,    value=`names<-`(   # `names<-` here returns a re-named one column data frame     `[`(df, 2),            value="x" ) )  

Version 2 names(df)[2] <- "x" translates to:

df <- `names<-`(   df,   `[<-`(      names(df), 2, "x" ) ) 

Also, turns out this is "documented" in R Inferno Section 8.2.34 (Thanks @Frank):

right <- wrong <- c(a=1, b=2) names(wrong[1]) <- 'changed' wrong # a b # 1 2 names(right)[1] <- 'changed' right # changed b # 1 2 
like image 85
BrodieG Avatar answered Oct 13 '22 01:10

BrodieG