I'm attempting to use data.table
to match the nearest decimal value in a vector but am running into a situation where more than one result is returned. The simplified example below returns two values, 0.1818182 0.2727273
, but using a less precise value for x
(e.g. 0.0275
) returns a single match (0.1818182
).
x = 0.0275016249293408
dt = data.table(rnk = c(0, 0.0909090909090909,
0.181818181818182, 0.272727272727273),
val = c(0.0233775088495975, 0.0270831481152598,
0.0275016216267234, 0.0275016249293408),
key="val")
dt[J(x), roll="nearest"][, ifelse(is.na(val), NA_real_, rnk)]
I'm assuming the problem is related to the precision of the numeric values I'm using for this comparison. Is there a limitation to the decimal precision that can be used for a nearest match (i.e. do I need to round the data points)? Is there a better way to accomplish this nearest match?
Yes, data.table
automatically applies a tolerance when joining and grouping numeric
columns. The tolerance in v1.8.10 is sqrt(.Machine$double.eps) == 1.490116e-08
. This comes directly from ?base::all.equal
.
To illustrate, consider grouping :
> dt
rnk val
1: 0.00000000 0.02337751
2: 0.09090909 0.02708315
3: 0.18181818 0.02750162
4: 0.27272727 0.02750162
> dt[,.N,by=val]
val N
1: 0.02337751 1
2: 0.02708315 1
3: 0.02750162 2 # one group, size two
>
When you joined using dt[J(x), roll="nearest"]
, that x
value matched to within tolerance and you got the group it matched to, as usual when a matching value occurs in a rolling join. roll="nearest"
only applies to the values that don't match, outside tolerance.
data.table
considers the values in rows 3 and 4 of val
to be equal. The thinking behind this is for convenience, since most of the time key values are really a fixed precision such as prices ($1.23) or recorded measurements to a specified precision (1.234567). We'd like to join and group such numerics
even after multiplying them for example, without needing to code for machine accuracy ourselves. And we'd like to avoid confusion when numeric
data displays as though it's equal in a table, but isn't due to very tiny differences in the bit representation.
See ?unique.data.table
for this example :
DT = data.table(a=tan(pi*(1/4 + 1:10)), b=rep(1,10)) # example from ?all.equal
length(unique(DT$a)) # 10 strictly unique floating point values
all.equal(DT$a,rep(1,10)) # TRUE, all within tolerance of 1.0
DT[,which.min(a)] # row 10, the strictly smallest floating point value
identical(unique(DT),DT[1]) # TRUE, stable within tolerance
identical(unique(DT),DT[10]) # FALSE
data.table
is also stable within tolerance; i.e, when you group by a numeric
, the original order of the items within that group are maintained as usual.
> dt$val[3] < dt$val[4] # in your example data, 3 is strictly less than 4
[1] TRUE
> dt[, row:=1:4] # add a row number to illustrate
> dt[, list(.N, list(row)), by=val]
val N V2
1: 0.02337751 1 1
2: 0.02708315 1 2
3: 0.02750162 2 3,4
> dt[3:4, val:=rev(val)] # swap the two values around
> dt$val[3] > dt$val[4]
[1] TRUE
> dt[, list(.N, list(row)), by=val]
val N V2
1: 0.02337751 1 1
2: 0.02708315 1 2
3: 0.02750162 2 3,4 # same result, consistent. stable within tolerance
Referring to Matt's answer there is an easy way to use all the 15 significant digits a double offers in order to properly select the closest matching row. Instead of working on the original values, one can scale the values up to ensure that the 15 significant digits lie above the 10^(-8) level. This could be done as follows:
orig_vals <- dt[,val]
scale_fact <- max(10^(trunc(log10(abs(orig_vals)))+8))
scaled_vals <- orig_vals * scale_fact
dt[,scaled_val:=scaled_vals]
setkey(dt,scaled_val)
Now, performing the rolling join
scaled_x <- x*scale_fact
dt[J(scaled_x), roll="nearest"][, ifelse(is.na(val), NA_real_, rnk)]
# [1] 0.2727273
yields - as desired - a single value.
If also in the case of two identical key values only one row should be selected, the mult="first"
argument can be added to the above data.table
call.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With