Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

knitr gets tricked by data.table `:=` assignment

It seems that knitr doesn't understand that DT[, a:=1] should not result in an output of DT to the document. Is there a way to stop this behaviour?

Example knitr document:

Data.Table Markdown ======================================================== Suppose we make a `data.table` in **R Markdown** ```{r} DT = data.table(a = rnorm(10)) ``` Notice that it doesn't display the contents until we do a ```{r} DT ``` style command.  However, if we want to use `:=` to create another column ```{r} DT[, c:=5] ``` It would appear that the absence of a equals sign tricks `knitr` into thinking this  is to be printed. 

Knitr Output:

enter image description here

Is this a knitr bug or a data.table bug?

EDIT

I have only just noticed, that knitr is being weird when it is echoing the code. Look at the output above. In my source code I have DT[, c:=5] but what knitr renders is

DT[, `:=`(c, 5)] 

Weird...

EDIT 2: Caching

Caching also seems to have a problem with := but that must be a different cause, so is a separate question here: why does knitr caching fail for data.table `:=`?

like image 574
Corvus Avatar asked Mar 07 '13 09:03

Corvus


2 Answers

Update Oct 2014. Now in data.table v1.9.5 :

:= no longer prints in knitr for consistency with behaviour at the prompt, #505. Output of a test knit("knitr.Rmd") is now in data.table's unit tests.

and related :

if (TRUE) DT[,LHS:=RHS] now doesn't print (thanks to Jureiss, #869). Test added. To get this to work we've had to live with one downside: if a := is used inside a function with no DT[] before the end of the function, then the next time DT is typed at the prompt, nothing will be printed. A repeated DT will print. To avoid this: include a DT[] after the last := in your function. If that is not possible (e.g., it's not a function you can change) then print(DT) and DT[] at the prompt are guaranteed to print. As before, adding an extra [] on the end of a := query is a recommended idiom to update and then print; e.g. > DT[,foo:=3L][]



Previous answer kept for posterity (the global$depthtrigger business is no longer done as from data.table v1.9.5 so this is no longer true) ...

Just to be clear I understand then: knitr is printing when you don't want it to.

Try increasing data.table:::.global$depthtrigger a little bit at the start of the script.

This will be 3 for you currently :

data.table:::.global$depthtrigger [1] 3 

I don't know how much eval depth knitr adds to the stack. But try changing the trigger to 4 first; i.e.

assign("depthtrigger", 4, data.table:::.global) 

and at the end of the knitr script ensure to set it back to 3. If 4 doesn't work, try 5, then 6. If you get to 10 give up and I'll think again. ;-P

Why might this work?

See NEWS from v1.8.4 :

DT[,LHS:=RHS,...] no longer prints DT. This implements #2128 "Try again to get DT[i,j:=value] to return invisibly". Thanks to discussions here :
how to suppress output when using `:=` in R {data.table}, prior to v1.8.3?
http://r.789695.n4.nabble.com/Avoiding-print-when-using-tp4643076.html
FAQs 2.21 and 2.22 have been updated.

FAQ 2.21 Why does DT[i,col:=value] return the whole of DT? I expected either no visible value (consistent with <-), or a message or return value containing how many rows were updated. It isn't obvious that the data has indeed been updated by reference.
This has changed in v1.8.3 to meet your expectations. Please upgrade. The whole of DT is returned (now invisibly) so that compound syntax can work; e.g., DT[i,done:=TRUE][,sum(done)]. The number of rows updated is returned when verbosity is on, either on a per query basis or globally using options(datatable.verbose=TRUE).

FAQ 2.22 Ok, thanks. What was so difficult about the result of DT[i,col:=value] being returned invisibly?
R internally forces visibility on for [. The value of FunTab's eval column (see src/main/names.c) for [ is 0 meaning force R_Visible on (see R-Internals section 1.6). Therefore, when we tried invisible() or setting R_Visible to 0 directly ourselves, eval in src/main/eval.c would force it on again. To solve this problem, the key was to stop trying to stop the print method running after a :=. Instead, inside := we now (from v1.8.3) set a global flag which the print method uses to know whether to actually print or not.

That global flag is data.table:::.global$print. At the top of data.table:::print.data.table you'll see it looking at it. That's because there is no known way to suppress printing from [ (as FAQ 2.22 explains).

So, inside := inside [.data.table it looks to see how "deep" this call is :

if (Cstack_info()[["eval_depth"]] <= .global$depthtrigger) {     suppPrint = function(x) { .global$print=FALSE; x }     # Suppress print when returns ok not on error, bug #2376.     # Thanks to: https://stackoverflow.com/a/13606880/403310     # All appropriate returns following this point are     # wrapped i.e. return(suppPrint(x)). } 

Essential that's just saying: if DT[,x:=y] is running at the prompt, then I know the REPL is going to call the print method on my result, beyond my control. Ok, so given print method is going to run, I'm going to suppress it inside that print method by setting a flag (since the print method that runs (i.e. print.data.table) is something I can control).

In knitr's case it's simulating the REPL in a clever way. It isn't really a script, iiuc, otherwise DT[,x:=y] wouldn't print anyway for that reason. But because it's simulating REPL via an eval there is an extra level of eval depth for code run from knitr. Or something similar (I don't know knitr).

Which is why I'm thinking increasing the depthtrigger might do the trick.

Hacky/crufty, I agree. But if it works, and you let me know which value works, I can change data.table to be knitr aware and change the depthtrigger automatically. Or any better solutions are most welcome.

like image 107
Matt Dowle Avatar answered Sep 28 '22 07:09

Matt Dowle


Why not just use:

```{r, results='hide'} DT[, c:=5] ``` 
like image 44
Matt Pollock Avatar answered Sep 28 '22 07:09

Matt Pollock