Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is MongoDB the right DB for a community site?

I'm creating a community site with Node.JS and Express and almost all express tutorials or examples use MongoDB, so I checked it out. The only DB I used so far is MySQL but I'm not that much familiar with it so it wouldn't bother me to read into MongoDB. Mongo looks quite nice and the document model might be beneficial. And with mongoose it's easy to use. But I got some questions so I don't spend a lot of time learning to use MongoDB if it doesn't fit at all:

  1. I've read that MongoDB is unreliable if you use it on only one machine and you might encounter data loss. Is that right? The project is not that big that I could afford another server and data loss is totally a no go! Imagine some forum posts just disappear. But I guess people wouldn't use it if that happens.

  2. The site will contain a self-build forum and I'm not sure if a relational db would be better. However you could save threads with embedded posts and so on. But no idea how to search as Mongo doesn't support full text search. What do you think?

  3. When to use embedded documents in Mongo? Example: A user can post status updates like on Twitter. Would you save these updates in the user document? Can be a lot of updates. Or a document per update and link it to the user id? 3.1 And how to query across multiple documents? You want to fetch the last 10 status updates of your friends. You can do that with JOINs in MySQL.

  4. Is there a way to use auto-incremental IDs for documents like in MySQL? A user for example should have a unique integer key but I don't want some random number like Mongo does it, in order keep the user IDs small.

  5. How do you handle race condition in mongoose? You load a document from db, edit something and save it later. But maybe it has already changed in the mean time.

like image 677
Xomby Avatar asked Nov 18 '11 12:11

Xomby


2 Answers

To address each question separately:

  1. No, that is not true anymore. Older versions of MongoDB did not have journaling, but current versions do, and from v. 2, it is activated by default. However, you should use a SafeMode on the driver level which ensures that the communication between driver and database were successful.

  2. Embedded posts and threads might not be the best choice. We've built a similar thing, and we're using a flat collection where each post stores the ParentId and the ParentThreadId. There are pros and cons to embedding, but the arguments for our decision were:

    a) Often times, we only want to fetch the most recent comments site-wide or n most recent comments in a given thread, both of which can't really be done using embedded documents.

    b) If you have a lot of people writing about the same topic at the same time, you need to be careful about concurrency. This can be solved, but we felt safer using different objects that can't really interfere, even if you make mistakes

    c) As Joe points out, you'll have to handle full-text search in a different system.

  3. Embedded documents are not so well suited if you have a lot of updates, because the container (the collection item that contains the embedded objects) will grow. When it grows, MongoDB will have to reallocate it, which may take longer and fragments data.

    3(a). For status updates of friends, using a fan-out strategy makes sense. I answered a similar question yesterday.

  4. Don't use auto-increment numbers. This is a flawed design by default, because it does not really work that well in a distributed environment. For the db, it doesn't make a difference whether it stores a int with value 0x00000001 or one with value 0xfa9ac7335. There's no point in keeping the numbers small. I'd go with the Mongo ObjectId or Guid/UUID. The former also contains a timestamp btw.

  5. I haven't used mongoose, but in general, there are the typical strategies of pessimistic and optimistic locks.

like image 192
mnemosyn Avatar answered Sep 22 '22 02:09

mnemosyn


  1. By default MongoDB writes are fire and forget, so if something goes wrong there is a possibility of data loss. You can use SafeMode which gives you a response if the write was successful or not, then handle it any way you want. Having said that I've not experienced any lost data myself. Multiple servers would be replication which is used for failover, if one node goes down another can automatically be promoted to as the master.

  2. If you want full text search then you can't really do it with Mongo. You could tokenize each word in a post and store each word in an embedded array on the document which would be indexed you could query for each of those words. The problem with that is then you have no relevancy. You could build in some relevancy logic with Map Reduce, but this would slow down your query. If you really want fast full text search you should look at SOLR or Elastic Search.

  3. Personally I wouldn't store status updates in a embedded document, I'd put them all in a separate collection with a user identifier. There are no joins in Mongo so you'd have to do two queries, one to get the IDs of your friends, another to get the status updates. Depending on the size of your collection, with the right indexes in place this would be extremely fast even though it is two queries.

  4. I don't think you can use an auto-incrementing integer for an ID at Mongo level. You could handle it yourself in the application as you can use any field for the identifier. When adding a new document you'd have to query the collection to get the highest ID and increment it. The Mongo Object ID is made up of Machine ID, Process ID, TimeStamp and some randomness to create an unique key.

  5. I'm not familiar with Mongoose.

like image 37
Joe Avatar answered Sep 20 '22 02:09

Joe