This (very basic) question is the result of an exchange here.
The documentation for setkey()
states:
setkey() sorts a data.table and marks it as sorted. The sorted columns are the key. The key can be any columns in any order. The columns are sorted in ascending order always. The table is changed by reference... (emphasis added)
I have always interpreted this to mean that setkey()
creates an index, rather than physically rearranging the rows of the data table (similar to indexing a database table). But if this was true then removing the key (using setkey(DT,NULL)
), should remove the index and restore the data table to it's original, unsorted order. This is not what happens:
library(data.table)
DT <- data.table(a=3:1, b=1:3, c=5:7); DT
a b c
1: 3 1 5
2: 2 2 6
3: 1 3 7
setkey(DT,a); DT
a b c
1: 1 3 7
2: 2 2 6
3: 3 1 5
setkey(DT,NULL)
a b c
1: 1 3 7
2: 2 2 6
3: 3 1 5
So two questions:
1: If the rows are rearranged (sorted), then what does "changed by reference" mean?
2: What does setkey(DT,NULL)
do exactly?
The rows are sorted. "Changed by reference" here means there is no copying of the entire table and rows are just swapped.
setkey(DT, NULL)
is equivalent to setattr(DT, "sorted", NULL)
. It simply unsets the "sorted" attribute.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With