How do I select all the rows that have a missing value in the primary key in a data table.
DT = data.table(x=rep(c("a","b",NA),each=3), y=c(1,3,6), v=1:9) setkey(DT,x)
Selecting for a particular value is easy
DT["a",]
Selecting for the missing values seems to require a vector search. One cannot use binary search. Am I correct?
DT[NA,]# does not work DT[is.na(x),] #does work
Fortunately, DT[is.na(x),]
is nearly as fast as (e.g.) DT["a",]
, so in practice, this may not really matter much:
library(data.table) library(rbenchmark) DT = data.table(x=rep(c("a","b",NA),each=3e6), y=c(1,3,6), v=1:9) setkey(DT,x) benchmark(DT["a",], DT[is.na(x),], replications=20) # test replications elapsed relative user.self sys.self user.child # 1 DT["a", ] 20 9.18 1.000 7.31 1.83 NA # 2 DT[is.na(x), ] 20 10.55 1.149 8.69 1.85 NA
===
Addition from Matthew (won't fit in comment) :
The data above has 3 very large groups, though. So the speed advantage of binary search is dominated here by the time to create the large subset (1/3 of the data is copied).
benchmark(DT["a",], # repeat select of large subset on my netbook DT[is.na(x),], replications=3) test replications elapsed relative user.self sys.self DT["a", ] 3 2.406 1.000 2.357 0.044 DT[is.na(x), ] 3 3.876 1.611 3.812 0.056 benchmark(DT["a",which=TRUE], # isolate search time DT[is.na(x),which=TRUE], replications=3) test replications elapsed relative user.self sys.self DT["a", which = TRUE] 3 0.492 1.000 0.492 0.000 DT[is.na(x), which = TRUE] 3 2.941 5.978 2.932 0.004
As the size of the subset returned decreases (e.g. adding more groups), the difference becomes apparent. Vector scans on a single column aren't too bad, but on 2 or more columns it quickly degrades.
Maybe NAs should be joinable to. I seem to remember a gotcha with that, though. Here's some history linked from FR#1043 Allow or disallow NA in keys?. It mentions there that NA_integer_
is internally a negative integer. That trips up radix/counting sort (iirc) resulting in setkey
going slower. But it's on the list to revisit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With