Scenario:
10.000.000 record/day
Records: Visitor, day of visit, cluster (Where do we see it), metadata
What we want to know with this information:
The model i stick with in order to easily query this information is:
{
VisitorId:1,
ClusterVisit: [
{clusterId:1, dates:[date1, date2]},
{clusterId:2, dates:[date1, date3]}
]
}
Index:
I also have to split groups of clusters into different collections in order to ease to access the data more efficiently.
Importing: First we search for a combination of VisitorId - ClusterId and we addToSet the date.
Second: If first doesn't match, we upsert:
$addToSet: {VisitorId:1,
ClusterVisit: [{clusterId:1, dates:[date1]}]
}
With First and Second importing i cover if the clusterId doesn't exists or if VisitorId doesn´t exists.
Problems: totally inefficient (near impossible) on update / insert / upsert when the collection grows, i guess because of the document size getting bigger when adding a new date. Difficult to maintain (unset dates mostly)
i have a collection with more than 50.000.000 that i can't grow any more. It updates only 100 ~ records/sec.
I think the model i'm using is not the best for this size of information. What do you think will be best to get more upsert/sec and query the information FAST, before i mess with sharding, which is going to take more time while i learn and get confident with it.
I have a x1.large instance on AWS RAID 10 with 10 disks
Arrays are expensive on large collections: mapreduce, aggregate...
Try .explain(): MongoDB 'count()' is very slow. How do we refine/work around with it?
Add explicit hints for index: Simple MongoDB query very slow although index is set
A full heap?: Insert performance of node-mongodb-native
The end of memory space for collection: How to improve performance of update() and save() in MongoDB?
Special read clustering: http://www.colinhowe.co.uk/2011/02/23/mongodb-performance-for-data-bigger-than-memor/
Global write lock?: mongodb bad performance
Slow logs performance track: Track MongoDB performance?
Rotate your logs: Does logging output to an output file affect mongoDB performance?
Use profiler: http://www.mongodb.org/display/DOCS/Database+Profiler
Move some collection caches to RAM: MongoDB preload documents into RAM for better performance
Some ideas about collection allocation size: MongoDB data schema performance
Use separate collections: MongoDB performance with growing data structure
A single query can only use one index (better is a compound one): Why is this mongodb query so slow?
A missing key?: Slow MongoDB query: can you explain why?
Maybe shards: MongoDB's performance on aggregation queries
Improving performance stackoverflow links: https://stackoverflow.com/a/7635093/602018
A good point for further sharding replica education is: https://education.10gen.com/courses
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With