I can successfully use foverlaps
with a small sample of my dataset, but when use the full data (data.tables with over 30k rows), it breaks down and throws the following error:
Error message:
Error in if (any(x[[xintervals[2L]]] - x[[xintervals[1L]]] < 0L)) stop("All entries in column ", :
missing value where TRUE/FALSE needed
The way I am interpreting the error message is that there are no overlaps between the two data.tables.
Q1-Am I interpreting the message well?
Q2-Any idea why this might happen with the larger dataset? Is it possible that this is due to the size of the dataset?
I do have a lot of unique values, which according to foverlaps
help file, can be expected to slow things down proportionally, but not before it get into millions of rows, which is far from being the case here. Thank you.
Very briefly, foverlaps () collapses the two-column interval in y to one-column of unique values to generate a lookup table, and then performs the join depending on the type of overlap, using the already available binary search feature of data.table.
Default value is any. Allowed values are any , within, start, end and equal. The types shown here are identical in functionality to the function findOverlaps in the bioconductor package IRanges. Let [a,b] and [c,d] be intervals in x and y with a<=b and c<=d. For type="start", the intervals overlap iff a == c .
For intervals [a,b] and [c,d], where a<=b and c<=d, when c > b or d < a, the two intervals don't overlap. If the gap between these two intervals is <= maxgap, these two intervals are considered as overlapping. Note: This is not yet implemented. It should be a positive integer value, > 0. Default is 1.
The time (and space) required to generate the lookup is therefore proportional to the number of unique values present in the interval columns of y when combined together. Overlap joins takes advantage of the fact that y is sorted to speed-up finding overlaps. Therefore y has to be keyed (see ?setkey ) prior to running foverlaps ().
There is no reproducible example so it is not possible to investigate your issue.
As stated by Carl in comment it is likely caused by NA values present in input.
In the recent development version there has been some improvements made to foverlaps
by Arun. One of those improvements is better error message when NA values are detected.
install.packages("data.table")
This feature is already on CRAN as of 1.12.2.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With