Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Foverlaps error: Error in if (any(x[[xintervals[2L]]] - x[[xintervals[1L]]] < 0L)) stop

Tags:

r

data.table

I can successfully use foverlaps with a small sample of my dataset, but when use the full data (data.tables with over 30k rows), it breaks down and throws the following error:

Error message:

Error in if (any(x[[xintervals[2L]]] - x[[xintervals[1L]]] < 0L)) stop("All entries in column ",  :
  missing value where TRUE/FALSE needed

The way I am interpreting the error message is that there are no overlaps between the two data.tables.

Q1-Am I interpreting the message well?

Q2-Any idea why this might happen with the larger dataset? Is it possible that this is due to the size of the dataset?

I do have a lot of unique values, which according to foverlaps help file, can be expected to slow things down proportionally, but not before it get into millions of rows, which is far from being the case here. Thank you.

like image 943
jpinelo Avatar asked May 07 '15 13:05

jpinelo


People also ask

What is foverlaps() in SQL Server?

Very briefly, foverlaps () collapses the two-column interval in y to one-column of unique values to generate a lookup table, and then performs the join depending on the type of overlap, using the already available binary search feature of data.table.

What are the default and allowed values of findoverlaps?

Default value is any. Allowed values are any , within, start, end and equal. The types shown here are identical in functionality to the function findOverlaps in the bioconductor package IRanges. Let [a,b] and [c,d] be intervals in x and y with a<=b and c<=d. For type="start", the intervals overlap iff a == c .

How to check if two intervals are overlapping or not?

For intervals [a,b] and [c,d], where a<=b and c<=d, when c > b or d < a, the two intervals don't overlap. If the gap between these two intervals is <= maxgap, these two intervals are considered as overlapping. Note: This is not yet implemented. It should be a positive integer value, > 0. Default is 1.

Why does y have to be keyed before using foverlaps?

The time (and space) required to generate the lookup is therefore proportional to the number of unique values present in the interval columns of y when combined together. Overlap joins takes advantage of the fact that y is sorted to speed-up finding overlaps. Therefore y has to be keyed (see ?setkey ) prior to running foverlaps ().


1 Answers

There is no reproducible example so it is not possible to investigate your issue.
As stated by Carl in comment it is likely caused by NA values present in input.
In the recent development version there has been some improvements made to foverlaps by Arun. One of those improvements is better error message when NA values are detected.

install.packages("data.table")

This feature is already on CRAN as of 1.12.2.

like image 72
jangorecki Avatar answered Oct 16 '22 18:10

jangorecki