I've got a very specific use case and because I'm not too familiar with database replication, I am open to suggestions and ideas about how to accomplish the following in the best possible way:
A web application + database is running on a remote server. Let's call this set-up R for remote.
Now suppose there are 3 separate geographical locations which need read+write access to the database. I will call these locations L1, L2 and L3.
The main problem: the remote server might be unavailable or the internet connection of one of the locations might not always work, rendering the remote application unavailable; but we want the application to work as a high availability solution (on-site) even when the remote server is down or when there is an internet connection problem.
Partial solution: So I was thinking about giving each geographical location its own server with a local copy of the web application. The web application itself can get updated when needed from a version control system automatically (for example using git hooks).
So far so good... (at least I believe so?)
But what about our data? The really tricky part seems to be the database replication. Let's assume no DNS or IP failover and assume that the user first tries to access the remote server directly and if this does not work, the user can still use the local server on-site instead. This all happens inside a web browser (or similar client).
One possible (but unsatisfactory) solution would be to use master-slave replication from R (master) to L1, L2 and L3 (slaves). When doing this asynchronously this should be quite fast? I think this is a viable solution for temporary local read-only database access when the main server is broken or can't be accessed.
But... what about read-write support? I suppose we would need multi-master replication in this case, but I am afraid that synchronous replication using something like (for example) MySQL Cluster or Galera would slow things down, especially since L1, L2 and L3 are on lower bandwidth connections. And they are connected through WAN. (Also, L1, L2 or L3 might not always be online.)
The real question: How would you tackle this specific use case? At the moment I am leaning towards multi-master replication if it doesn't slow down things too much. The application itself will mainly be used by employees on-site but by some external people over WAN as well. Would multi-master replication work well? What if for example L1 is down for 24 hours and suddenly comes back on-line? What if R can't be accessed?
EXTRA: not my main question, but I also need the synchronized data to be sent securely over SSL, if possible, please take this into account for your answer.
Perhaps I am still forgetting some necessary details; if so, please respond with some feedback and I will try to update my question accordingly.
Please note that I haven't decided on a database yet and the database schema will be developed from scratch, so ideas using other databases or database engines are welcome as well. (At the moment I have most experience with MySQL and PostgreSQL)
As you are still undecided, I would strongly recommand you to have a look at MS-SQL merge replication. It is strong, highly reliable, replicates through LAN and HTTPS (so called web replication), and not that expensive.
Terminology differs from the mySql Master\Slave idea. We are here talking about one publisher, and multiple subscribers. All changes done at subscriber's level are collected and sent to the publisher, then redistributed to all subscribers (with, if needed, fancy options like 'filtered subscriptions').
Standard architecture will then be:
We have been using this architecture for years, including:
Such an architecture is not available "from the shelf" with MySQL. I guess it could be built, but it would then certainly be a lot more expensive than just buying the corresponding MS-SQL licenses. Do not forget that the free SQLEXPRESS version of MS-SQL can be a subscriber.
Be careful: If you are planning to go through such a configuration, I would (really) strongly advise you to have all primary keys set to uniqueIdentifier data type, and randomly generated. This will avoid the typical replication pitfall, where PK's are set to int with automatic increment, and where independant servers generate identical primary keys between two replications (MS-SQL proposes a tool to avoid such problems, where you can allocate PK ranges per server, but this solution is a real PITA ...).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With