Using data.table, say I'm setting the key using two columns, and one of the columns has missing values. Data table seems to sort the NA
values to the first values.
require(data.table)
set.seed(919)
# Create sample data
dt <- data.table(
key1 = rep(1:10, each = 10),
key2 = rep_len(letters, 100)
)
# Set some key2 values to missing
dt[sample(1:100, 10), "key2"] <- NA
# Set key (sort)
setkeyv(dt, c("key1", "key2"))
dt
# 1: 1 NA
# 2: 1 a
# 3: 1 b
# 4: 1 c
# 5: 1 d
# 6: 1 f
# 7: 1 g
# 8: 1 h
# 9: 1 i
# 10: 1 j
# 11: 2 NA
# 12: 2 NA
# 13: 2 k
# 14: 2 m
# 15: 2 n
# 16: 2 o
# 17: 2 p
# 18: 2 q
# 19: 2 r
# 20: 2 s
# 21: 3 a
# 22: 3 b
# 23: 3 c
# 24: 3 d
# 25: 3 u
# 26: 3 v
# 27: 3 w
# 28: 3 x
# 29: 3 y
# 30: 3 z
# 31: 4 e
# 32: 4 f
# 33: 4 g
# 34: 4 h
# 35: 4 i
# 36: 4 j
# 37: 4 k
# 38: 4 l
# 39: 4 m
# 40: 4 n
# 41: 5 NA
# 42: 5 NA
# 43: 5 o
# 44: 5 q
# 45: 5 r
# 46: 5 s
# 47: 5 u
# 48: 5 v
# 49: 5 w
# 50: 5 x
# 51: 6 NA
# 52: 6 a
# 53: 6 b
# 54: 6 c
# 55: 6 d
# 56: 6 e
# 57: 6 g
# 58: 6 h
# 59: 6 y
# 60: 6 z
# 61: 7 i
# 62: 7 j
# 63: 7 k
# 64: 7 l
# 65: 7 m
# 66: 7 n
# 67: 7 o
# 68: 7 p
# 69: 7 q
# 70: 7 r
# 71: 8 NA
# 72: 8 NA
# 73: 8 a
# 74: 8 b
# 75: 8 t
# 76: 8 u
# 77: 8 w
# 78: 8 x
# 79: 8 y
# 80: 8 z
# 81: 9 NA
# 82: 9 c
# 83: 9 d
# 84: 9 e
# 85: 9 f
# 86: 9 h
# 87: 9 i
# 88: 9 j
# 89: 9 k
# 90: 9 l
# 91: 10 NA
# 92: 10 m
# 93: 10 n
# 94: 10 o
# 95: 10 p
# 96: 10 r
# 97: 10 s
# 98: 10 t
# 99: 10 u
# 100: 10 v
# key1 key2
Does this always happen, or will I run into problems if I always assume this is true?
Step1: To perform the sorting process under your data table (not sorted) place Sort DataTable and indicate the Input DataTable you want to sort. Step2: Next, create a Data Table variable that stores the Sorted DT.
Description. setkey sorts a data. table and marks it as sorted with an attribute sorted . The sorted columns are the key. The key can be any number of columns.
Core. Activities. SortDataTable Sorts an entire DataTable by ascending or descending order, based on the values of a specified column.
For setkey()
, data.table behaves like base R sort(x, na.last=FALSE)
, as the sort order (always increasing) is essential for binary search based joins/subsets. Rationale for NA
s appearing first is that:
"NAs are internally large negative number[s]" github.com/Rdatatable/data.table/issues/434
Miscellaneous comments: If you are just looking to reorder your data, you should consider setorder()
, which is capable of sorting in any order and positioning NA
s in the beginning or end.
By the way, the standard syntax there is dt[sample(1:100, 10), key2 := NA]
and you should watch out for mistaking the two-character string "NA"
for NA
(not a problem in your example).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With