How does data.table sort NA values on key columns?

Tags:

data.table

Using data.table, say I'm setting the key using two columns, and one of the columns has missing values. Data table seems to sort the NA values to the first values.

Click to copy

require(data.table)
set.seed(919)

# Create sample data
dt <- data.table(
  key1 = rep(1:10, each = 10),
  key2 = rep_len(letters, 100)
  )

# Set some key2 values to missing
dt[sample(1:100, 10), "key2"] <- NA

# Set key (sort)
setkeyv(dt, c("key1", "key2"))
dt
# 1:    1   NA
# 2:    1    a
# 3:    1    b
# 4:    1    c
# 5:    1    d
# 6:    1    f
# 7:    1    g
# 8:    1    h
# 9:    1    i
# 10:    1    j
# 11:    2   NA
# 12:    2   NA
# 13:    2    k
# 14:    2    m
# 15:    2    n
# 16:    2    o
# 17:    2    p
# 18:    2    q
# 19:    2    r
# 20:    2    s
# 21:    3    a
# 22:    3    b
# 23:    3    c
# 24:    3    d
# 25:    3    u
# 26:    3    v
# 27:    3    w
# 28:    3    x
# 29:    3    y
# 30:    3    z
# 31:    4    e
# 32:    4    f
# 33:    4    g
# 34:    4    h
# 35:    4    i
# 36:    4    j
# 37:    4    k
# 38:    4    l
# 39:    4    m
# 40:    4    n
# 41:    5   NA
# 42:    5   NA
# 43:    5    o
# 44:    5    q
# 45:    5    r
# 46:    5    s
# 47:    5    u
# 48:    5    v
# 49:    5    w
# 50:    5    x
# 51:    6   NA
# 52:    6    a
# 53:    6    b
# 54:    6    c
# 55:    6    d
# 56:    6    e
# 57:    6    g
# 58:    6    h
# 59:    6    y
# 60:    6    z
# 61:    7    i
# 62:    7    j
# 63:    7    k
# 64:    7    l
# 65:    7    m
# 66:    7    n
# 67:    7    o
# 68:    7    p
# 69:    7    q
# 70:    7    r
# 71:    8   NA
# 72:    8   NA
# 73:    8    a
# 74:    8    b
# 75:    8    t
# 76:    8    u
# 77:    8    w
# 78:    8    x
# 79:    8    y
# 80:    8    z
# 81:    9   NA
# 82:    9    c
# 83:    9    d
# 84:    9    e
# 85:    9    f
# 86:    9    h
# 87:    9    i
# 88:    9    j
# 89:    9    k
# 90:    9    l
# 91:   10   NA
# 92:   10    m
# 93:   10    n
# 94:   10    o
# 95:   10    p
# 96:   10    r
# 97:   10    s
# 98:   10    t
# 99:   10    u
# 100:   10    v
# key1 key2

Does this always happen, or will I run into problems if I always assume this is true?

561

asked Sep 28 '22 21:09

1 Answers

For setkey(), data.table behaves like base R sort(x, na.last=FALSE), as the sort order (always increasing) is essential for binary search based joins/subsets. Rationale for NAs appearing first is that:

"NAs are internally large negative number[s]" github.com/Rdatatable/data.table/issues/434

Miscellaneous comments: If you are just looking to reorder your data, you should consider setorder(), which is capable of sorting in any order and positioning NAs in the beginning or end.

By the way, the standard syntax there is dt[sample(1:100, 10), key2 := NA] and you should watch out for mistaking the two-character string "NA" for NA (not a problem in your example).

162

answered Oct 27 '22 08:10

Frank

Related questions
                            
                                Keep the 2 highest values of each row in a list of matrices r
                            
                                How to use UTF-8 characters in dygraph title in R
                            
                                Clip lines to plot area and display text outside plot area
                            
                                plotly not getting geom_text in R / ggplot
                            
                                In R, use regular expression to match multiple patterns and add new column to list
                            
                                Leaflet in R not showing variable circle colors
                            
                                How to calculate accuracy from table matrix
                            
                                creating a data frame with 100 columns of desired column name in R
                            
                                R: Warning when creating a (long) list of dummies
                            
                                different output using stringi and gsub (using the same pattern on the same string)
                            
                                Weights argument in R gbm function
                            
                                Argument "No" is missing, with no default
                            
                                Shiny + DT: how to make datatable reactive?
                            
                                shinyapp.io rename deploy path
                            
                                R - histogram within histogram
                            
                                Setting R_LIBS & avoiding "Would you like to use a personal library instead?"
                            
                                Unable to load any package in R (unable to load shared object)
                            
                                How can a function return its name and arguments in R
                            
                                pheatmap scale="row" giving Error in hclust(d, method = method) : NA/NaN/Inf in foreign function call
                            
                                Including an image using roxygen documentation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does data.table sort NA values on key columns?

Tags:

r

data.table

Jake Fisher

People also ask

1 Answers

Frank

Recent Activity

Donate For Us