Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table - does setkey(...) create an index or physically reorder the rows in a data table?

Tags:

r

data.table

This (very basic) question is the result of an exchange here.

The documentation for setkey() states:

setkey() sorts a data.table and marks it as sorted. The sorted columns are the key. The key can be any columns in any order. The columns are sorted in ascending order always. The table is changed by reference... (emphasis added)

I have always interpreted this to mean that setkey() creates an index, rather than physically rearranging the rows of the data table (similar to indexing a database table). But if this was true then removing the key (using setkey(DT,NULL)), should remove the index and restore the data table to it's original, unsorted order. This is not what happens:

library(data.table)
DT <- data.table(a=3:1, b=1:3, c=5:7); DT
   a b c
1: 3 1 5
2: 2 2 6
3: 1 3 7
setkey(DT,a); DT
   a b c
1: 1 3 7
2: 2 2 6
3: 3 1 5
setkey(DT,NULL)
   a b c
1: 1 3 7
2: 2 2 6
3: 3 1 5

So two questions:

1: If the rows are rearranged (sorted), then what does "changed by reference" mean?

2: What does setkey(DT,NULL) do exactly?

like image 247
jlhoward Avatar asked Nov 19 '13 16:11

jlhoward


1 Answers

  1. The rows are sorted. "Changed by reference" here means there is no copying of the entire table and rows are just swapped.

  2. setkey(DT, NULL) is equivalent to setattr(DT, "sorted", NULL). It simply unsets the "sorted" attribute.

like image 124
eddi Avatar answered Nov 10 '22 10:11

eddi