Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Newly added column in 'j' of data.table should be available in the scope

Tags:

r

data.table

I have this code:

dat<-dat[,list(colA,colB
                     ,RelativeIncome=Income/.SD[Nation=="America",Income]
                     ,RelativeIncomeLog2=log2(Income)-log2(.SD[Nation=="America",Income])) #Read 1)
               ,by=list(Name,Nation)]

1) I would like to be able to say "RelativeIncomeLog2=log2(RelativeIncome)", but "RelativeIncome" is not available in j's scope?

2) I tried the following instead (per the data.table FAQ). Now "RelativeIncome" is available but it doesn't add the columns:

     dat<-dat[,{colA;colB;RelativeIncome=Income/.SD[Nation=="America",Income];
               ,RelativeIncomeLog2=log2(RelativeIncome)])) 
               ,by=list(Name,Nation)]
like image 730
varuman Avatar asked May 12 '13 18:05

varuman


People also ask

How do I add a column to a data table in R?

A column can be added to an existing data table using := operator. Here ':' represents the fixed values and '=' represents the assignment of values. So, they together represent the assignment of fixed values. Therefore, with the help of “:=” we will add 2 columns in the above table.

How do I rename a column in a data table in R?

Method 1: using colnames() method colnames() method in R is used to rename and replace the column names of the data frame in R. The columns of the data frame can be renamed by specifying the new column names as a vector.

How do I add data to a table in R?

To add or insert observation/row to an existing Data Frame in R, we use rbind() function. We can add single or multiple observations/rows to a Data Frame in R using rbind() function.


1 Answers

You can create and assign objects in j, just use { curly braces }.

You can then pass these objects (or functions & calculations of the objects) out of j and assign them as columns of the data.table. To assign more than once column at a time, simply:

  • wrap the LHS in c(.) make sure column names are strings and
  • the last line of j (ie, the "return" value) should be a list.

  dat[ , c("NewIncomeComlumn", "AnotherNewColumn") := { 
                 RelativeIncome     <- Income/.SD[Nation == "A", Income];   
                 RelativeIncomeLog2 <- log2(RelativeIncome);  
                 ## this last line is what will be asigned.
                 list(RelativeIncomeLog2 * 100,  c("A", "hello", "World"))
                 # assigned values are recycled as needed.
                 # If the recycling does not match up, a warning is issued. 
                }
                , by = list(Name, Nation)
               ]

You can losely think of j as a function within the environment of dat

You can also get a lot more sophisticated and complex if required. You can also incorporate by arguments as well, using by=list(<someName>=col)

In fact, similar to functions, simply creating an object in j and assigning it a value, does not mean that it will be available outside of j. In order for it to be assigned to your data.table, you must return it. j automatically returns the last line; if that last line is a list, each element of the list will be handled as a column. If you are assigning by reference (ie, using := ) then you will achieve the results you are expecting.


On a separate note, I noticed the following in your code:

 Income / .SD[Nation == "America", Income]

 # Which instead could simply be: 
 Income / Income[Nation == "America"]

.SD is great in that it is a wonderful shorthand. However, to invoke it without needing all of the columns which it encapsulates is to burden your code with extra memory costs. If you are using only a single column, consider naming that column explicitly or perhaps add the .SDcols argument (after j) and being naming the columns needed there.

like image 97
Ricardo Saporta Avatar answered Nov 14 '22 22:11

Ricardo Saporta