Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB performs bad on 600k objects, alternative DB? optimizations?

I've started a new project using node.js and mongodb and after almost 2 days I gathered about 600k objects in MongoDB. I'm already noticing a huge (negative) impact on the performance and I'm starting to worry if I should move to another DB as long as I can, or if I should stick with Mongo and do some (more) optimizations.

Basically I'm storing coordinates like this:

[x1] => 687
[y1] => 167
[x2] => 686
[y2] => 167
[c] => 0
[s] => 0
[m] => 1299430700312
[_id] => MongoId Object (
    [$id] => 4d73bd2c82bb5926780001ec
)

Not more... and my queries look like this:

{'$or': [ { x1: {'$gte' : 0, '$lt' : 1000 }, y1: {'$gte' : 0, '$lt' : 1000 } , { x2: {'$gte' : 0, '$lt' : 1000 }, y2: {'$gte' : 0, '$lt' : 1000 } } ] }

I've tried setting the index for each of the fields: x1,y1,y1,y1 as well as for: {x1:1,y1:1},{x2:1,y2:1}. Furthermore I've also only fetched the required fields I need... but still, performing a query with a resultset of ~40k rows ends up in a runtime of 2-8secs. Btw: performing the same query in PHP died with with a Out-of-Memory message (256MB RAM).

The machine is a Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz with 8GB of RAM, it's not the most dusty one in the rack ;)

I'm really running out of ideas and I see millions and millions of rows coming for the next weeks. As you probably noticed the rows are relatively small. Would MySQL with partitioning perform better? Any other NoSQL DB?

And please to trolling about "2-8secs isn't slow" - it's becoming a problem already. When a couple of uncached requests hit the machine at the same time, the load raises up to 4 and less than 10 users accessing it.

like image 330
Steffen Avatar asked Mar 08 '11 00:03

Steffen


1 Answers

Thanks to all of you who took the time to think about my issue. The suggestions of using Geospatial Indexes seem to be the answer I was looking for. Besides the fact that the indexes are more effective for mongodb the way of querying entire boxes simply rocks!

To give some facts: I've just started to rewrite my code and collection data and began with a simple comparison. My data before looked like this:


[x1] => 190
[y1] => 18
[x2] => 192
[y2] => 18
[c] => 0
[s] => 0
[b] => Array (
    [0] => 0
    [1] => 0
)
[m] => 1299365242802
[r] => 32596
[_id] => MongoId Object (
    [$id] => 4d72bd7af0528ea82f000003
)

The indexes were:


{x1:1,y1:1}, {x2:1,y2:1}

Now my data looks like this:


[_id] => MongoId Object (
    [$id] => 4d825799b15953b90d000000
)
[coords] => Array (
    [x] => 190
    [y] => 18
)
[x2] => 192
[y2] => 18
[s] => 0
[c] => 0
[m] => 1299365242802
[r] => 32596

index:


{coords:'2D'}

I compared two scripts. First one queries a box of 400x400 pixels from the old collection and it took:



real    0m0.375s
user    0m0.348s
sys     0m0.021s


Second script uses the index and queries for the same box but using the geospatial index:

real    0m0.107s
user    0m0.096s
sys     0m0.012s

That's a huge difference and I only have about 3200objects in my collections (each). My live database/collection already contains about almost 2 million objects now (after 12 days online). I can't wait to benchmark the live data with these scripts. It looks very promising to me! :)

Thank you all, Stackoverflow rocks! )

like image 157
Steffen Avatar answered Sep 19 '22 02:09

Steffen