Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add a column in the data frame within a function

Tags:

r

I have a data frame, and I want to do some calculation with existing columns and create new column in my data set which is a combination of existing... I can do this easily outside function... but if I wrap the code witin function, the changes I made (inside functions) are not visible outside function... i.e. the new column doesn't exist...

I would appreciate sample code to do this...

like image 429
user318247 Avatar asked Jun 15 '11 13:06

user318247


People also ask

How do I add a column to an existing DataFrame in R?

You can add new columns to a dataframe using the $ and assignment <- operators. To do this, just use the df$name notation and assign a new vector of data to it. As you can see, survey has a new column with the name sex with the values we specified earlier.

Which function is used to add additional columns in a DataFrame?

Using assign() assign() method can be used when you need to insert multiple new columns in a DataFrame, when you need to ignore the index of the column to be added or when you need to overwrite the values of an existing columns.


2 Answers

I'll assume it is about R... R does not pass arguments by reference (environments and reference classes (S5) are an exception, but this is out of the current range of abstraction). Thus, when you write

addThree<-function(x){
 x<-x+3
}
4->y
addThree(y)

y is still 4 at the end of code, because inside the function, x is the fresh copy of ys value, not the y itself (again, not exactly, but those are higher order details).

Thus, you must adapt to R's pass-by-copy scheme and return the altered value and assign it back to your variable (using old wording, there are no procedures in R):

addThree<-function(x){
 return(x+3)
}
4->y
addThree(y)->y
#y is now 7

Don't worry, this works smoothly for even more complex objects because R is garbage-collected and has lazy evaluation.

BTW, you can omit return if you want to return the last value produced in function, i.e. addThree's definition may look like this:

addThree<-function(x) x+3
like image 93
mbq Avatar answered Sep 22 '22 04:09

mbq


the best approach is to use mutate() from dplyr library. Example:

addcol = function(dat){
    dat1 = mutate(dat, x2=x1*2)
    return(dat1)
}

dat is a data frame with a column named "x1". Use this function addcol(), the new dataset now has a new column named "x2" which is twice the value of "x1", assuming x1 is numeric.

like image 30
zephyr Avatar answered Sep 23 '22 04:09

zephyr