I have the following query that is kind of like a reverse range lookup:
db.ip_ranges.find({ $and: [{ start_ip_num: { $lte: 1204135028 } }, { end_ip_num: { $gt: 1204135028 } }] })
When run with only the $lte identifier, the query returns right away. But when I run with both the $gt and $lte in the same query, it is extremely slow (in seconds).
Both the start_ip_num and end_ip_num fields are indexed.
How can I go about optimizing this query?
EDIT
I get the following when I use the explain() function on the query:
{
"cursor" : "BtreeCursor start_ip_num_1",
"nscanned" : 452336,
"nscannedObjects" : 452336,
"n" : 1,
"millis" : 2218,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"start_ip_num" : [
[
-1.7976931348623157e+308,
1204135028
]
]
}
}
EDIT 2
Once I added the compound index, the explain() function returns the following:
{
"cursor" : "BtreeCursor start_ip_num_1_end_ip_num_1",
"nscanned" : 431776,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 3433,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"start_ip_num" : [
[
-1.7976931348623157e+308,
1204135028
]
],
"end_ip_num" : [
[
1204135028,
1.7976931348623157e+308
]
]
}
}
However, the perf is still poor (in seconds).
So, double range queries are ill-advised in Mongo. I assume you have a single index containing both {start_ip_num: 1, end_ip_num: 1}
.
If that doesn't get you close enough (often it is still slow if you have enough data returned by the first field, since it has to do a lot of B-tree scanning), there is one trick you can do to combat this using 2D box queries (only works for two ranges at a time).
Basically, you put a 2D geo index on a field containing the two points in an array, like [start_ip, end_ip], and give it a high enough min/max value so that it won't hit the limits which are by default just -180/180.
Finally, use a bounds query with the range going from min to the $lte value on one corner of the box, and the gt and the max value on the other corner of the box. See http://www.mongodb.org/display/DOCS/Geospatial+Indexing#GeospatialIndexing-BoundsQueries for syntax.
It looks something like this:
db.ip_ranges.find({ip_range:{$within:{$box:[[0, 1204135028], [1204135028, max]]}}});
where max is the biggest ip you can have.
It's been a while since I've looked a this, so the box might be wrong, but the concept is sound, and it made the double range queries perform a bit better than with a regular two field B-tree index. Consistently under a second (although usually a few hundred ms), compared to a few seconds with the regular index--I think I had hundreds of millions of docs at the time, but it's been a while so take these remembered benchmarks with a grain of salt. Results will vary greatly depending on your data and range sizes, I'm sure.
Update: You might want to experiment with the bits
setting, trying a low number and a high number to see if it makes a difference. For me, it didn't seem to affect the queries on average. See http://www.mongodb.org/display/DOCS/Geospatial+Indexing#GeospatialIndexing-CreatingtheIndex for syntax.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With