Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to suppress output when using `:=` in R {data.table}, prior to v1.8.3?

Tags:

r

data.table

Is there a way to prevent data.table to print the new data.table after assigning a new column by reference? I gather standard behaviour is

library(data.table) example(data.table) DT #    x y  v # 1: a 1 42 # 2: a 3 42 # 3: a 6 42 # 4: b 1 11 # 5: b 3 11 # 6: b 6 11 # 7: c 1  7 # 8: c 3  8 # 9: c 6  9  DT[,z:=1:nrow(DT)]  #    x y  v z # 1: a 1 42 1 # 2: a 3 42 2 # 3: a 6 42 3 # 4: b 1 11 4 # 5: b 3 11 5 # 6: b 6 11 6 # 7: c 1  7 7 # 8: c 3  8 8 # 9: c 6  9 9 

i.e. the table is printed to screen after assignment. is there a way to stop data.table from showing the new table after assigning the new column z? I know I can stop this behaviour by saying

DT <- copy(DT[,z:=1:nrow(DT)]) 

but that is defeating the purpose of := (which is designed to avoid copies).

like image 659
Florian Oswald Avatar asked Jul 06 '12 09:07

Florian Oswald


1 Answers

Since <-.data.table doesn't make a copy, you can use <-:

Create a data.table object:

library(data.table) di <- data.table(iris) 

Create a new column:

di <- di[, z:=1:nrow(di)] di  #       Sepal.Length Sepal.Width Petal.Length Petal.Width Species  z #  [1,]          5.1         3.5          1.4         0.2  setosa  1 #  [2,]          4.9         3.0          1.4         0.2  setosa  2 #  [3,]          4.7         3.2          1.3         0.2  setosa  3 #  [4,]          4.6         3.1          1.5         0.2  setosa  4 #  [5,]          5.0         3.6          1.4         0.2  setosa  5 #  [6,]          5.4         3.9          1.7         0.4  setosa  6 #  [7,]          4.6         3.4          1.4         0.3  setosa  7 #  [8,]          5.0         3.4          1.5         0.2  setosa  8 #  [9,]          4.4         2.9          1.4         0.2  setosa  9 # [10,]          4.9         3.1          1.5         0.1  setosa 10 # First 10 rows of 150 printed.  

It is also worth remembering that R only prints the value of an object in interactive mode.

So, in batch mode, you can simply use:

di[, z:=1:nrow(di)] 

This will not produce any output when run as a script in batch mode.


Further info from Matthew Dowle:

Also see FAQ 2.21 and 2.22 :

2.21 Why does DT[i,col:=value] return the whole of DT? I expected either no visible value (consistent with <-), or a message or return value containing how many rows were updated. It isn't obvious that the data has indeed been updated by reference.

So that compound syntax can work; e.g., DT[i,done:=TRUE][,sum(done)]. The number of rows updated is returned when verbosity is on, either on a per query basis or globally using options(datatable.verbose=TRUE).

2.22 Ok, but can't the return value of DT[i,col:=value] be returned invisibly, then?

  • We tried to but R internally forces visibility on for [. The value of FunTab's eval column (see src/main/names.c) for [ is 0 meaning force R_Visible on (see R-Internals section 1.6). Therefore, when we tried invisible() or setting R_Visible to 0 directly ourselves, eval in src/main/eval.c would force it on again.
  • After getting used to this behaviour, you might grow to prefer it (we have). After all, how many times do we subassign using <- and then immediately look at the data to check it's ok?
  • We can mix := into a j which also returns data; a mixed update and select in one query. To detect whether j solely updates (and then behave dierently) could be confusing.

Second update from Matthew Dowle:

We have now found a solution and v1.8.3 no longer prints the result when := is used. We will update FAQ 2.21 and 2.22.

like image 144
Andrie Avatar answered Oct 01 '22 05:10

Andrie