I've started a new project using node.js and mongodb and after almost 2 days I gathered about 600k objects in MongoDB. I'm already noticing a huge (negative) impact on the performance and I'm starting to worry if I should move to another DB as long as I can, or if I should stick with Mongo and do some (more) optimizations.
Basically I'm storing coordinates like this:
[x1] => 687
[y1] => 167
[x2] => 686
[y2] => 167
[c] => 0
[s] => 0
[m] => 1299430700312
[_id] => MongoId Object (
[$id] => 4d73bd2c82bb5926780001ec
)
Not more... and my queries look like this:
{'$or': [ { x1: {'$gte' : 0, '$lt' : 1000 }, y1: {'$gte' : 0, '$lt' : 1000 } , { x2: {'$gte' : 0, '$lt' : 1000 }, y2: {'$gte' : 0, '$lt' : 1000 } } ] }
I've tried setting the index for each of the fields: x1,y1,y1,y1 as well as for: {x1:1,y1:1},{x2:1,y2:1}
.
Furthermore I've also only fetched the required fields I need... but still, performing a query with a resultset of ~40k rows ends up in a runtime of 2-8secs.
Btw: performing the same query in PHP died with with a Out-of-Memory message (256MB RAM).
The machine is a Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz with 8GB of RAM, it's not the most dusty one in the rack ;)
I'm really running out of ideas and I see millions and millions of rows coming for the next weeks. As you probably noticed the rows are relatively small. Would MySQL with partitioning perform better? Any other NoSQL DB?
And please to trolling about "2-8secs isn't slow" - it's becoming a problem already. When a couple of uncached requests hit the machine at the same time, the load raises up to 4 and less than 10 users accessing it.
Thanks to all of you who took the time to think about my issue. The suggestions of using Geospatial Indexes seem to be the answer I was looking for. Besides the fact that the indexes are more effective for mongodb the way of querying entire boxes simply rocks!
To give some facts: I've just started to rewrite my code and collection data and began with a simple comparison. My data before looked like this:
[x1] => 190
[y1] => 18
[x2] => 192
[y2] => 18
[c] => 0
[s] => 0
[b] => Array (
[0] => 0
[1] => 0
)
[m] => 1299365242802
[r] => 32596
[_id] => MongoId Object (
[$id] => 4d72bd7af0528ea82f000003
)
The indexes were:
{x1:1,y1:1}, {x2:1,y2:1}
Now my data looks like this:
[_id] => MongoId Object (
[$id] => 4d825799b15953b90d000000
)
[coords] => Array (
[x] => 190
[y] => 18
)
[x2] => 192
[y2] => 18
[s] => 0
[c] => 0
[m] => 1299365242802
[r] => 32596
index:
{coords:'2D'}
I compared two scripts. First one queries a box of 400x400 pixels from the old collection and it took:
real 0m0.375s user 0m0.348s sys 0m0.021s
Second script uses the index and queries for the same box but using the geospatial index:
real 0m0.107s user 0m0.096s sys 0m0.012s
That's a huge difference and I only have about 3200objects in my collections (each). My live database/collection already contains about almost 2 million objects now (after 12 days online). I can't wait to benchmark the live data with these scripts. It looks very promising to me! :)
Thank you all, Stackoverflow rocks! )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With