Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize MongoDB query with both $gt and $lte?

I have the following query that is kind of like a reverse range lookup:

db.ip_ranges.find({ $and: [{ start_ip_num: { $lte: 1204135028 } }, { end_ip_num: { $gt: 1204135028 } }] })

When run with only the $lte identifier, the query returns right away. But when I run with both the $gt and $lte in the same query, it is extremely slow (in seconds).

Both the start_ip_num and end_ip_num fields are indexed.

How can I go about optimizing this query?

EDIT

I get the following when I use the explain() function on the query:

{
    "cursor" : "BtreeCursor start_ip_num_1",
    "nscanned" : 452336,
    "nscannedObjects" : 452336,
    "n" : 1,
    "millis" : 2218,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : false,
    "indexOnly" : false,
    "indexBounds" : {
        "start_ip_num" : [
            [
                -1.7976931348623157e+308,
                1204135028
            ]
        ]
    }
}

EDIT 2

Once I added the compound index, the explain() function returns the following:

{
    "cursor" : "BtreeCursor start_ip_num_1_end_ip_num_1",
    "nscanned" : 431776,
    "nscannedObjects" : 1,
    "n" : 1,
    "millis" : 3433,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : false,
    "indexOnly" : false,
    "indexBounds" : {
        "start_ip_num" : [
            [
                -1.7976931348623157e+308,
                1204135028
            ]
        ],
        "end_ip_num" : [
            [
                1204135028,
                1.7976931348623157e+308
            ]
        ]
    }
}

However, the perf is still poor (in seconds).

like image 392
Bryan Avatar asked Oct 24 '12 20:10

Bryan


1 Answers

So, double range queries are ill-advised in Mongo. I assume you have a single index containing both {start_ip_num: 1, end_ip_num: 1}.

If that doesn't get you close enough (often it is still slow if you have enough data returned by the first field, since it has to do a lot of B-tree scanning), there is one trick you can do to combat this using 2D box queries (only works for two ranges at a time).

Basically, you put a 2D geo index on a field containing the two points in an array, like [start_ip, end_ip], and give it a high enough min/max value so that it won't hit the limits which are by default just -180/180.

Finally, use a bounds query with the range going from min to the $lte value on one corner of the box, and the gt and the max value on the other corner of the box. See http://www.mongodb.org/display/DOCS/Geospatial+Indexing#GeospatialIndexing-BoundsQueries for syntax.

It looks something like this:

db.ip_ranges.find({ip_range:{$within:{$box:[[0, 1204135028], [1204135028, max]]}}});

where max is the biggest ip you can have.

It's been a while since I've looked a this, so the box might be wrong, but the concept is sound, and it made the double range queries perform a bit better than with a regular two field B-tree index. Consistently under a second (although usually a few hundred ms), compared to a few seconds with the regular index--I think I had hundreds of millions of docs at the time, but it's been a while so take these remembered benchmarks with a grain of salt. Results will vary greatly depending on your data and range sizes, I'm sure.

Update: You might want to experiment with the bits setting, trying a low number and a high number to see if it makes a difference. For me, it didn't seem to affect the queries on average. See http://www.mongodb.org/display/DOCS/Geospatial+Indexing#GeospatialIndexing-CreatingtheIndex for syntax.

like image 57
Eve Freeman Avatar answered Oct 18 '22 22:10

Eve Freeman