I have this code: <pre class="prettyprint"><code>dat<-dat[,list(colA,colB ,RelativeIncome=Income/.SD[Nation=="America",Income] ,RelativeIncomeLog2=log2(Income)-log2(.SD[Nation=="America",Income])) #Read 1) ,by=list(Name,Nation)] </code></pre> 1) I would like to be able to say <code>"RelativeIncomeLog2=log2(RelativeIncome)"</code>, but <code>"RelativeIncome"</code> is not available in <code>j</code>'s scope? 2) I tried the following instead (per the data.table FAQ). Now <code>"RelativeIncome"</code> is available but it doesn't add the columns: <pre class="prettyprint"><code> dat<-dat[,{colA;colB;RelativeIncome=Income/.SD[Nation=="America",Income]; ,RelativeIncomeLog2=log2(RelativeIncome)])) ,by=list(Name,Nation)] </code></pre>

You can create and assign objects in <code>j</code>, just use <code>{</code> curly braces <code>}</code>. You can then pass these objects (or functions & calculations of the objects) out of <code>j</code> and assign them as columns of the data.table. To assign more than once column at a time, simply: <ul> <li>wrap the <code>LHS</code> in <code>c(.)</code> make sure column names are strings and </li> <li>the last line of <code>j</code> (ie, the "return" value) should be a list. </li> </ul> <hr> <pre class="prettyprint"><code> dat[ , c("NewIncomeComlumn", "AnotherNewColumn") := { RelativeIncome <- Income/.SD[Nation == "A", Income]; RelativeIncomeLog2 <- log2(RelativeIncome); ## this last line is what will be asigned. list(RelativeIncomeLog2 * 100, c("A", "hello", "World")) # assigned values are recycled as needed. # If the recycling does not match up, a warning is issued. } , by = list(Name, Nation) ] </code></pre> <hr> You can losely think of <code>j</code> as a function within the environment of <code>dat</code> You can also get a lot more sophisticated and complex if required. You can also incorporate <code>by</code> arguments as well, using <code>by=list(<someName>=col)</code> In fact, similar to functions, simply creating an object in <code>j</code> and assigning it a value, does not mean that it will be available outside of <code>j</code>. In order for it to be assigned to your data.table, you must return it. <code>j</code> automatically returns the last line; if that last line is a list, each element of the list will be handled as a column. If you are assigning by reference (ie, using <code>:=</code> ) then you will achieve the results you are expecting. <hr> On a separate note, I noticed the following in your code: <pre class="prettyprint"><code> Income / .SD[Nation == "America", Income] # Which instead could simply be: Income / Income[Nation == "America"] </code></pre> <code>.SD</code> is great in that it is a wonderful shorthand. However, to invoke it without needing all of the columns which it encapsulates is to burden your code with extra memory costs. If you are using only a single column, consider naming that column explicitly or perhaps add the <code>.SDcols</code> argument (after <code>j</code>) and being naming the columns needed there.

Newly added column in 'j' of data.table should be available in the scope

Tags:

r

data.table

I have this code:

dat<-dat[,list(colA,colB
                     ,RelativeIncome=Income/.SD[Nation=="America",Income]
                     ,RelativeIncomeLog2=log2(Income)-log2(.SD[Nation=="America",Income])) #Read 1)
               ,by=list(Name,Nation)]

1) I would like to be able to say "RelativeIncomeLog2=log2(RelativeIncome)", but "RelativeIncome" is not available in j's scope?

2) I tried the following instead (per the data.table FAQ). Now "RelativeIncome" is available but it doesn't add the columns:

     dat<-dat[,{colA;colB;RelativeIncome=Income/.SD[Nation=="America",Income];
               ,RelativeIncomeLog2=log2(RelativeIncome)])) 
               ,by=list(Name,Nation)]

730

asked May 12 '13 18:05

varuman

1 Answers

You can create and assign objects in j, just use { curly braces }.

You can then pass these objects (or functions & calculations of the objects) out of j and assign them as columns of the data.table. To assign more than once column at a time, simply:

wrap the LHS in c(.) make sure column names are strings and
the last line of j (ie, the "return" value) should be a list.

  dat[ , c("NewIncomeComlumn", "AnotherNewColumn") := { 
                 RelativeIncome     <- Income/.SD[Nation == "A", Income];   
                 RelativeIncomeLog2 <- log2(RelativeIncome);  
                 ## this last line is what will be asigned.
                 list(RelativeIncomeLog2 * 100,  c("A", "hello", "World"))
                 # assigned values are recycled as needed.
                 # If the recycling does not match up, a warning is issued. 
                }
                , by = list(Name, Nation)
               ]

You can losely think of j as a function within the environment of dat

You can also get a lot more sophisticated and complex if required. You can also incorporate by arguments as well, using by=list(<someName>=col)

In fact, similar to functions, simply creating an object in j and assigning it a value, does not mean that it will be available outside of j. In order for it to be assigned to your data.table, you must return it. j automatically returns the last line; if that last line is a list, each element of the list will be handled as a column. If you are assigning by reference (ie, using := ) then you will achieve the results you are expecting.

On a separate note, I noticed the following in your code:

 Income / .SD[Nation == "America", Income]

 # Which instead could simply be: 
 Income / Income[Nation == "America"]

.SD is great in that it is a wonderful shorthand. However, to invoke it without needing all of the columns which it encapsulates is to burden your code with extra memory costs. If you are using only a single column, consider naming that column explicitly or perhaps add the .SDcols argument (after j) and being naming the columns needed there.

answered Nov 14 '22 22:11

Ricardo Saporta

Related questions
                            
                                importing messy data using R
                            
                                How can a script find itself in R running from the command line?
                            
                                Where does R store temporary files
                            
                                Can readLines be executed in parallel within R
                            
                                indexing a dataframe based on a vector subset
                            
                                cannot compile RcppArmadillo in R
                            
                                How does Roxygen to handle infix binary operators (eg. %in%)?
                            
                                Most efficient/vectorization when using previous calculated value (rolling)
                            
                                multivariable regression with ggplot2
                            
                                Combine multiple .RData files containing objects with the same name into one single .RData file
                            
                                Include apsrtable (or stargazer) output in an Rmd file
                            
                                How to calculate readabilty in R with the tm package
                            
                                Split strings into columns in R where each string has a potentially different number of column entries
                            
                                Passing package name as argument in R
                            
                                Why lines function closes the path in R?
                            
                                Convert numeric to date
                            
                                R anonymous function: capture variables by value
                            
                                Removing backticks in R output
                            
                                passing file name to R from javascript using Rook package
                            
                                How to use "cast" in reshape without aggregation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With