Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keeping integrity between two separate data stores during backups (MySQL and MongoDB)

I have an application I designed where relational data sits and fits naturally into MySQL. I have other data that has a constantly evolving schema and doesn't have relational data, so I figured the natural way to store this data would be in MongoDB as a document. My issue here is one of my documents references a MySQL primary ID. So far this has worked without any issues. My concern is that when production traffic comes in and we start working with backups, that there might be inconsistency for when the document changes, it might not point to the correct ID in the MySQL database. The only way to guarantee it to a certain degree would be to shutdown the application and take backups, which doesn't make much sense.

There has to be other people that deploy a similar strategy. What is the best way to ensure data integrity between the two data stores, particularly during backups?

like image 510
randombits Avatar asked Mar 08 '12 17:03

randombits


People also ask

Can MySQL and MongoDB work together?

Many retail applications benefit from a combination of MongoDB and MySQL. For example, New York City based e-commerce startup OpenSky needed to model a cross-industry product catalogue. Modeling different products with different attributes in a relational database became overly complex and cumbersome.

Which is more secure MongoDB or MySQL?

MongoDB also stands out for high availability and quick, instant failover and recovery. MySQL, on the other hand, stands out for its ability to handle a high transaction rate and ensure data consistency. It has a reliable, privilege-based security model.

Which database is best MongoDB or MySQL?

Why is using MongoDB better than using MySQL? Organizations of all sizes are adopting MongoDB, especially as a cloud database, because it enables them to build applications faster, handle highly diverse data types, and manage applications more efficiently at scale.

What type of situation using MongoDB is better than relational databases?

MongoDB is almost 100 times faster than traditional database system like RDBMS which is slower in comparison with the NoSQL databases. There is no support for complex joins in MongoDB but RDBMS supports complex joins which can be difficult to understand and take too much time to execute.


2 Answers

MySQL Perspective

All your MySQL data would have to use InnoDB. Then you could make a snapshot of the MySQL Data as follows:

MYSQLDUMP_OPTIONS="--single-transaction --routines --triggers"
mysqldump -u... -p... ${MYSQLDUMP_OPTIONS} --all-databases > MySQLData.sql

This will create a clean point-in-time snapshot of all MySQL Data as a single transaction.

For instance, if you start this mysqldump at midnight, all data in the mysqldump output will be from midnight. Data can still be added to MySQL (provided all your data uses the InnoDB Storage Engine) and you can have MongoDB reference any new data added to MySQL after midnight, even if it is during the backup.

If you have any MyISAM tables, you need to convert them to InnoDB. Let's cut to the chase. Here is how you make a script to convert all your MyISAM tables to InnoDB:

MYISAM_TO_INNODB_CONVERSION_SCRIPT=/root/ConvertMyISAMToInnoDB.sql
echo "SET SQL_LOG_BIN = 0;" > ${MYISAM_TO_INNODB_CONVERSION_SCRIPT}
mysql -u... -p... -AN -e"SELECT CONCAT('ALTER TABLE ',table_schema,'.',table_name,' ENGINE=InnoDB;') InnoDBConversionSQL FROM information_schema.tables WHERE engine='MyISAM' AND table_schema NOT IN ('information_schema','mysql','performance_schema') ORDER BY (data_length+index_length)" >> ${MYISAM_TO_INNODB_CONVERSION_SCRIPT}

Just run this script when you are ready to convert all user-defined MyISAM tables. Any system-related MyISAM tables are ignored and should not be touched anyway.

MongoDB Perspective

I cannot speak for MongoDB for I know very little. Yet, for the MongoDB side of things, if you setup a Replica Set for any MongoDB data, you could just use mongodump against a replica. Since mongodump is not point-in-time, you would have to disconnect the replica (to stop changes from coming over) and then perform the mongodump on the replica. Then reestablish the replica to its master. Find out from your developers or from 10gen if mongodump can be used against a disconnected replica set.

Common Objectives

If point-in-time truly matters to you, please makes sure all OS clocks have the same synchronized time and timezone. If you have to perform such a synchronization, your must restart mysqld and mongod. Then, your crontab jobs for mysqldump and mongodump will go off at the same time. Personally, I would delay a mongodump about 30 seconds to assure the ids from mysql you want posted in MongoDB are accounted for.

If you have mysqld and mongod running on the same server, then you do not need any MongoDB replication. Just start a mysqldump at 00:00:00 (midnight) and the mongodump at 00:30:00 (30 sec after midnight).

like image 73
RolandoMySQLDBA Avatar answered Nov 05 '22 14:11

RolandoMySQLDBA


I don't think there is an easy way to do this. Mongo doesn't have complex transactions with rollback support so its very hard to maintain such integrity. One way to approach this would be to think of it as two ledgers, records all the updates on mysql ledger and then replay it on mongo ledger to maintain integrity. The other possible solution is to do this at the application level and stop the writes.

like image 31
Sid Avatar answered Nov 05 '22 13:11

Sid