I'm building a simple accounting system where a user has many bills. Now I'm trying to decide if bills should be its own collection, or nested within the user. I'm leaning towards the former but I've NEVER done any noSQL stuff so I'm just going by trial and error and what I think makes sense to me.
I understand that Mongo has a 4mb document size limit which is what's making me think that I should have a separate collection for bills, as these will accumulate daily and could eventually take up a large amount of space.
I'm just looking for opinions on the matter. Basically I'll be querying for bills of a user between different date periods (as you can imagine an accounting system would do).
Not that it really matters but I'm using Mongoid in a Rails3 project. I figured I'd do something like:
class User
references_many :bills
end
class Bill
referenced_in :user
end
Any comments or design suggestions are greatly appreciated.
Embedded documents are an efficient and clean way to store related data, especially data that's regularly accessed together. In general, when designing schemas for MongoDB, you should prefer embedding by default, and use references and application-side or database-side joins only when they're worthwhile.
One of the primary benefits of creating Embedded Relationships in MongoDB is that the queries are executed faster than the referenced relationship. This relationship also improves performance, and results are obtained quickly. This is also true for large datasets.
MongoDB provides you a cool feature which is known as Embedded or Nested Document. Embedded document or nested documents are those types of documents which contain a document inside another document.
An embedded, or nested, MongoDB Document is a normal document that's nested inside another document within a MongoDB collection. Embedded documents are particularly useful when a one-to-many relationship exists between documents.
1) Regarding the 4MB document limit, this is what the "MongoDB: The Definitive Guide" says :
Documents larger than 4MB (when converted to BSON) cannot be saved to the database. This is a somewhat arbitrary limit (and may be raised in the future); it is mostly to prevent bad schema design and ensure consistent performance. To see the BSON size (in bytes) of the document doc, run Object.bsonsize(doc) from the shell.
To give you an idea of how much 4MB is, the entire text of War and Peace is just 3.14MB.
In the end it depends on how big you expect the bills for a user to grow. I hope the excerpt above gives you an idea of the limits imposed by the document size.
2) De-normalized schema (bills go with the user document) is the way to go if you know that you are never going to run global queries on bills (example of such a query is if you want to retrieve the ten most recent bills entered into the system). You will have to use map-reduce to retrieve results for such queries if you use a denormalized schema.
Normalized schema (user and bills in separate documents) is a better choice if you want flexibility in how the bills are queried. However, since MongoDB doesn't support joins, you will have to run multiple queries every time you want to retrieve the bills corresponding to a user.
Given the use-case you mentioned, I'd go with de-normalized schema.
3) All updates in MongoDB are atomic and serialized. That should answer Steve's concern.
You may find these slides helpful. http://www.slideshare.net/kbanker/mongodb-meetup
You may also look at MongoDB's Production Deployments page. You may find the SF.net slides helpful.
One question you might want to consider is will there ever be a time where you'll need to reference the bills individually apart from their membership in a user? If so, it'll be simpler if they have an independent existence.
Apart from that, the size limit issue you've already identified is a good reason to split them off.
There might be a transactional issue as well, if you're writing a large user with many included bills, what happens if you get reasonably simultaneous writes of changes to the same user from different connections? I don't know enough about mongo to know how it would resolve this - my guess would be that if the writes contained different added bills you'd get them both, but if they contained different changes in existing bills you'd get overwrites - Hopefully someone else will comment on this, but at the very least I'd test it. If you're writing the bills to a separate collection this isn't a concern.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With