Let's say we are designing a new system and have decided to use MongoDB as the primary database. The data schema is very similar to a blog with [growing] comments.
In the book "MongoDB Developers", Tip #6: Do not embed fields that have unbound growth, it says it is inefficient to constantly append data to the end of an array (but it also hinted that comments are a "wierd edge case").
Let's say our new system is like those "comments" in a blog - dynamically growing all the time, but also sometimes changing or some being deleted.
So, having recognized that there could be a performance issue using MongoDB, what other alternative database (must be horizontally scalable database) could serve this purpose? (We don't mind using MongoDB as our primary database, but separate the "comments" to a alternative database. What are the options available?
Notes:
The Redis feature of having Hashes as its data types fit the description of our "comments" data structure - constantly growing but sometimes modified or deleted - BUT we do not need a pure in-memory database (we don't wish to dedicate so much RAM when the data can be persisted to the disk) - otherwise this would be a good fit for our problem
What about using CouchDB? We are not investigated about this product. How does it perform with a growing data structure?
1. Why is MongoDB high performance? Ad hoc queries, indexing, and real time aggregation provide powerful ways to access data. MongoDB is a distributed database by default, which allows for expansive horizontal scalability without any changes to application logic.
MongoDB is one of the systems that can be trusted in achieving these factors. Big Data refers to massive data that is fast-changing, can be quickly accessed and highly available for addressing needs efficiently.
Because of these distinctive requirements, NoSQL (non-relational) databases, such as MongoDB, are a powerful choice for storing big data.
MongoDB is an open source NoSQL database. As a non-relational database, it can process structured, semi-structured, and unstructured data. It uses a non-relational, document-oriented data model and a non-structured query language. MongoDB is highly flexible and enables you to combine and store multiple types of data.
To add to what Thilo said above, the reason to "not embed fields that have unbound growth" is because this type of document size expansion can cause MongoDB to have to move the document if it exceeds the current space allocated to it. You can read more about this in the Padding Factor section of the documentation.
Those types of moves are relatively expensive, especially if they happen frequently. Therefore limiting the size (essentially bounding that growth) of the comments equivalent in your main collection (most recent X etc.) and perhaps even pre-populating that document field (essentially manual padding) to reduce the moves caused by comment additions/changes may well be worth it for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With