Just wanted an opinion, or at least a rule of thumb over which is better in a database structure for CouchDB. Is it better to have all related data for an item in a single document, or have parts of all items in many documents?
Let me illustrate what I mean by giving you an example. I currently log 4 events from our system, at 1 minute intervals, lets call them event_1, event_2, event_3 and even_4. Data is stored for each of the 4 events, regardless of value (you'll always get a value, even if everything is okay).
Option 1: Group events, and append new timestamp/values to the document...
{
event_1: [
{ timestamp, value },
{ timestamp, value },
{ timestamp, value },
...etc
]
},
{
event_2: [
{ timestamp, value },
{ timestamp, value },
{ timestamp, value },
...etc
]
},
{
event_3: [
{ timestamp, value },
{ timestamp, value },
{ timestamp, value },
...etc
]
}
...etc
Option 2: Keep a huge list of documents, with the latest values (which is how they're actually delivered from the system)?
{
timestamp: {
{ event_1, value },
{ event_2, value },
{ event_3, value },
{ event_4, value }
}
},
{
timestamp: {
{ event_1, value },
{ event_2, value },
{ event_3, value },
{ event_4, value }
}
},
{
timestamp: {
{ event_1, value },
{ event_2, value },
{ event_3, value },
{ event_4, value }
}
}
...etc
I'm currently using the 2nd option, but was just curious to see peoples opinions on what would be considered best practice...I'm starting to think that Option 1 might be better, as the way i am reporting, results are grouped by event (shown in line graph of each event).
Document Size Limit The maximum BSON document size is 16 megabytes. The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth.
GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of 16 MB.
All you have to do is provide the MongoDB connection string and collection name. The script will output the top X largest documents when it finishes traversing the entire collection in batches. This is exactly what the built in cursor allows for. It streams the data rather than storing the entire collection to ram.
I would definitely prefer your Option 2.
Since CouchDB keeps all revisions of its documents there would be huge memory consumption using Option 1. So with each new value you store the new values and also a copy of the old ones. Using Option 2 you only store the new values without touching the old ones.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With