Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using MongoDB vs MySQL with lots of JSON fields?

Tags:

There is a microblogging type of application. Two main basic database stores zeroed upon are: MySQL or MongoDB.

I am planning to denormalize lot of data I.e. A vote done on a post is stored in a voting table, also a count is incremented in the main posts table. There are other actions involved with the post too (e.g. Like, vote down).

If I use MySQL, some of the data better suits as JSON than fixed schema, for faster lookups.

E.g.

POST_ID   |  activity_data  213423424 | { 'likes': {'count':213,'recent_likers' :              ['john','jack',..fixed list of recent N users]} , 'smiles' :               {'count':345,'recent_smilers' :              ['mary','jack',..fixed list of recent N users]}  } 

There are other components of the application as well, where usage of JSON is being proposed. So, to update a JSON field, the sequence is:

  1. Read the JSON in python script.

  2. Update the JSON

  3. Store the JSON back into MySQL.

It would have been single operation in MongoDB with atomic operations like $push,$inc,$pull etc. Also document structure of MongoDB suits my data well.

My considerations while choosing the data store.

Regarding MySQL:

  1. Stable and familiar.
  2. Backup and restore is easy.
  3. Some future schema changes can be avoided using some fields as schemaless JSON.
  4. May have to use layer of memcached early.
  5. JSON blobs will be static in some tables like main Posts, however will be updated alot in some other tables like Post votes and likes.

Regarding MongoDB:

  1. Better suited to store schema less data as documents.
  2. Caching might be avoided till a later stage.
  3. Sometimes the app may become write intensive, MongoDB can perform better at those points where unsafe writes are not an issue.
  4. Not sure about stability and reliability.
  5. Not sure about how easy is it to backup and restore.

Questions:

  1. Shall we chose MongoDB if half of data is schemaless, and is being stored as JSON if using MySQL?
  2. Some of the data like main posts is critical, so it will be saved using safe writes, the counters etc will be saved using unsafe writes. Is this policy based on importance of data, and write intensiveness correct?

  3. How easy is it to monitor, backup and restore MongoDB as compared to MySQL? We need to plan periodic backups ( say daily ), and restore them with ease in case of disaster. What are the best options I have with MongoDB to make it a safe bet for the application.

Stability, backup, snapshots, restoring, wider adoption I.e.database durability are the reasons pointing me to use MySQL as RDBMS+NoSql even though a NoSQL document storage could serve my purpose better.

Please focus your views on the choice between MySQL and MongoDB considering the database design I have in mind. I know there could be better ways to plan database design with either RDBMS or MongoDB documents. But that is not the current focus of my question.

UPDATE : From MySQL 5.7 onwards, MySQL supports a rich native JSON datatype which provides data flexibility as well as rich JSON querying.

https://dev.mysql.com/doc/refman/5.7/en/json.html

like image 727
DhruvPathak Avatar asked Oct 17 '12 12:10

DhruvPathak


People also ask

Which database is best for storing JSON data?

SQL Server and Azure SQL Database have native JSON functions that enable you to parse JSON documents using standard SQL language. You can store JSON documents in SQL Server or SQL Database and query JSON data as in a NoSQL database.

Is it good to store JSON in MySQL?

MySQL supports a native JSON data type defined by RFC 7159 that enables efficient access to data in JSON (JavaScript Object Notation) documents. The JSON data type provides these advantages over storing JSON-format strings in a string column: Automatic validation of JSON documents stored in JSON columns.

Is MongoDB more efficient than MySQL?

MongoDB speed debate, MongoDB usually comes out as the winner. MongoDB can accept large amounts of unstructured data much faster than MySQL thanks to slave replication and master replication. Depending on the types of data that you collect, you may benefit significantly from this feature.

Should I use MySQL or MongoDB?

MySQL is a good choice if you are working with a legacy application that requires multi-row transactions and has structured data with a clear schema. MongoDB is a good choice if: You want high data availability along with automatic and instant data recovery.


2 Answers

So, to directly answer the questions...

Shall we chose mongodb if half of data is schemaless, and is being stored as JSON if using MySQL?

Schemaless storage is certainly a compelling reason to go with MongoDB, but as you've pointed out, it's fairly easy to store JSON in a RDBMS as well. The power behind MongoDB is in the rich queries against schemaless storage.

If I might point out a small flaw in the illustration about updating a JSON field, it's not simply a matter of getting the current value, updating the document and then pushing it back to the database. The process must all be wrapped in a transaction. Transactions tend to be fairly straightforward, until you start denormalizing your database. Then something as simple as recording an upvote can lock tables all over your schema.

With MongoDB, there are no transactions. But operations can almost always be structured in a way that allow for atomic updates. This usually involves some dramatic shifts from the SQL paradigms, but in my opinion they're fairly obvious once you stop trying to force objects into tables. At the very least, lots of other folks have run into the same problems you'll be facing, and the Mongo community tends to be fairly open and vocal about the challenges they've overcome.

Some of the data like main posts is critical , so it will be saved using safe writes , the counters etc will be saved using unsafe writes. Is this policy based on importance of data, and write intensiveness correct?

By "safe writes" I assume you mean the option to turn on an automatic "getLastError()" after every write. We have a very thin wrapper over a DBCollection that allows us fine grained control over when getLastError() is called. However, our policy is not based on how "important" data is, but rather whether the code following the query is expecting any modifications to be immediately visible in the following reads.

Generally speaking, this is still a poor indicator, and we have instead migrated to findAndModify() for the same behavior. On the occasion where we still explicitly call getLastError() it is when the database is likely to reject a write, such as when we insert() with an _id that may be a duplicate.

How easy is it to monitor,backup and restore Mongodb as compared to mysql? We need to plan periodic backups (say daily), and restore them with ease in case of disaster. What are the best options I have with mongoDb to make it a safe bet for the application?

I'm afraid I can't speak to whether our backup/restore policy is effective as we have not had to restore yet. We're following the MongoDB recommendations for backing up; @mark-hillick has done a great job of summarizing those. We're using replica sets, and we have migrated MongoDB versions as well as introduced new replica members. So far we've had no downtime, so I'm not sure I can speak well to this point.

Stability,backup,snapshots,restoring,wider adoption i.e.database durability are the reasons pointing me to use MySQL as RDBMS+NoSql even though a NoSQL document storage could serve my purpose better.

So, in my experience, MongoDB offers storage of schemaless data with a set of query primitives rich enough that transactions can often be replaced by atomic operations. It's been tough to unlearn 10+ years worth of SQL experience, but every problem I've encountered has been addressed by the community or 10gen directly. We have not lost data or had any downtime that I can recall.

To put it simply, MongoDB is hands down the best data storage ecosystem I have ever used in terms of querying, maintenance, scalability, and reliability. Unless I had an application that was so clearly relational that I could not in good conscience use anything other than SQL, I would make every effort to use MongoDB.

I don't work for 10gen, but I'm very grateful for the folks who do.

like image 196
Fuwjax Avatar answered Sep 22 '22 17:09

Fuwjax


I'm not going to comment on the comparisons (I work for 10gen and don't feel it's appropriate for me to do so), however, I will answer the specific MongoDB questions so that you can better make your decision.

Back-Up

Documentation here is very thorough, covering many aspects:

  • Block-Level Methods (LVM makes it very easy and quite a lot of folk do this)
  • With/Without Journaling
  • EBS Snapshots
  • General Snapshots
  • Replication (technically not back-up, however, a lot of folk use replica sets for their redundancy and back-up - not recommending this but it is done)

Until recently, there is no MongoDB equivalent of mylvmbackup but a nice guy wrote one :) In his words

Early days so far: it's just a glorified shell script and needs way more error checking. But already it works for me and I figured I'd share the joy. Bug reports, patches & suggestions welcome.

Get yourself a copy from here.

Restores

  • Formats etc

mongodump is completely documented here and mongorestore is here.

mongodump will not contain the indexes but does contain the system.indexes collection so mongorestore can rebuild the indexes when you restore the bson file. The bson file is the actual data whereas mongoexport/mongoimport are not type-safe so it could be anything (techically speaking) :)

Monitoring

Documented here.

I like Cacti but afaik, the Cacti templates have not kept up with the changes in MongoDB and so rely on old syntax so post 2.0.4, I believe there are issues.

Nagios works well but it's Nagios so you either love or hate it. A lot of folk use Nagios and it seems to provide them with great visiblity.

I've heard of some folk looking at Zappix but I've never used it so can't comment.

Additionally, you can use MMS, which is free and hosted externally. Your MongoDB instances run an agent and one of those agents communicate (using python code) over https to mms.10gen.com. We use MMS to view all performance statistics on the MongoDB instances and it is very beneficial from a high-level wide view as well as offering the ability to drill down. It's simple to install and you don't have to run any hardware for this. Many customers run it and some compliment it with Cacti/Nagios.

Help information on MMS can be found here (it's a very detailed, inclusive document).

like image 35
Mark Hillick Avatar answered Sep 19 '22 17:09

Mark Hillick