Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does data.table sort NA values on key columns?

Tags:

r

data.table

Using data.table, say I'm setting the key using two columns, and one of the columns has missing values. Data table seems to sort the NA values to the first values.

require(data.table)
set.seed(919)

# Create sample data
dt <- data.table(
  key1 = rep(1:10, each = 10),
  key2 = rep_len(letters, 100)
  )

# Set some key2 values to missing
dt[sample(1:100, 10), "key2"] <- NA

# Set key (sort)
setkeyv(dt, c("key1", "key2"))
dt
# 1:    1   NA
# 2:    1    a
# 3:    1    b
# 4:    1    c
# 5:    1    d
# 6:    1    f
# 7:    1    g
# 8:    1    h
# 9:    1    i
# 10:    1    j
# 11:    2   NA
# 12:    2   NA
# 13:    2    k
# 14:    2    m
# 15:    2    n
# 16:    2    o
# 17:    2    p
# 18:    2    q
# 19:    2    r
# 20:    2    s
# 21:    3    a
# 22:    3    b
# 23:    3    c
# 24:    3    d
# 25:    3    u
# 26:    3    v
# 27:    3    w
# 28:    3    x
# 29:    3    y
# 30:    3    z
# 31:    4    e
# 32:    4    f
# 33:    4    g
# 34:    4    h
# 35:    4    i
# 36:    4    j
# 37:    4    k
# 38:    4    l
# 39:    4    m
# 40:    4    n
# 41:    5   NA
# 42:    5   NA
# 43:    5    o
# 44:    5    q
# 45:    5    r
# 46:    5    s
# 47:    5    u
# 48:    5    v
# 49:    5    w
# 50:    5    x
# 51:    6   NA
# 52:    6    a
# 53:    6    b
# 54:    6    c
# 55:    6    d
# 56:    6    e
# 57:    6    g
# 58:    6    h
# 59:    6    y
# 60:    6    z
# 61:    7    i
# 62:    7    j
# 63:    7    k
# 64:    7    l
# 65:    7    m
# 66:    7    n
# 67:    7    o
# 68:    7    p
# 69:    7    q
# 70:    7    r
# 71:    8   NA
# 72:    8   NA
# 73:    8    a
# 74:    8    b
# 75:    8    t
# 76:    8    u
# 77:    8    w
# 78:    8    x
# 79:    8    y
# 80:    8    z
# 81:    9   NA
# 82:    9    c
# 83:    9    d
# 84:    9    e
# 85:    9    f
# 86:    9    h
# 87:    9    i
# 88:    9    j
# 89:    9    k
# 90:    9    l
# 91:   10   NA
# 92:   10    m
# 93:   10    n
# 94:   10    o
# 95:   10    p
# 96:   10    r
# 97:   10    s
# 98:   10    t
# 99:   10    u
# 100:   10    v
# key1 key2

Does this always happen, or will I run into problems if I always assume this is true?

like image 561
Jake Fisher Avatar asked Sep 28 '22 21:09

Jake Fisher


People also ask

How do you sort a DataTable based on a column in Uipath?

Step1: To perform the sorting process under your data table (not sorted) place Sort DataTable and indicate the Input DataTable you want to sort. Step2: Next, create a Data Table variable that stores the Sorted DT.

What does setkey mean in R?

Description. setkey sorts a data. table and marks it as sorted with an attribute sorted . The sorted columns are the key. The key can be any number of columns.

What activity should you use to sort a DataTable ascending in Uipath?

Core. Activities. SortDataTable Sorts an entire DataTable by ascending or descending order, based on the values of a specified column.


1 Answers

For setkey(), data.table behaves like base R sort(x, na.last=FALSE), as the sort order (always increasing) is essential for binary search based joins/subsets. Rationale for NAs appearing first is that:

"NAs are internally large negative number[s]" github.com/Rdatatable/data.table/issues/434


Miscellaneous comments: If you are just looking to reorder your data, you should consider setorder(), which is capable of sorting in any order and positioning NAs in the beginning or end.

By the way, the standard syntax there is dt[sample(1:100, 10), key2 := NA] and you should watch out for mistaking the two-character string "NA" for NA (not a problem in your example).

like image 162
Frank Avatar answered Oct 27 '22 08:10

Frank