When it comes to database replication, what is the use of global transaction identifiers? Why do we need it to prevent concurrency across the servers? How is that prevention achieved exactly? I tried to read the documentation at http://dev.mysql.com/doc/refman/5.7/en/replication-gtids.html but still could not understand it clearly. This may sound very basic but I would really appreciate it if someone could explain the concepts to me.

The reason for the Global Transaction ID is to allow a MySQL slave to know if it has applied a given transaction or not, to keep things in sync between Master and Slave. It can also be used for restarting a slave if a connection goes down, again to know the point in time. Without using GTIDs, replication must be controlled based on the position in a given binary transaction log file (bin log). This is much harder to manage than the GTID method. A master is the only server that is typically written to, so that slaves merely rebuild a copy of the master by applying each transaction in sequence. It is also important to understand that MySQL replication can run in one of 3 modes: <ul> <li>Statement-based: Each SQL statement is logged to the binlog and replicated as a statement to the slave. This can be in some cases ambiguous at the slave causing the data to not match exactly. (Most of the time it is fine for common uses).</li> <li>Row-based: In this mode MySQL replicates the actual data changes to each table, with a "before" and "after" picture of each row, which is fully accurate. This can result in a much larger binlog, for example if you have a bulk update query, like: UPDATE t1 SET c1 = 'a' WHERE c2 = 'b'. </li> <li>Mixed: In this mode, MySQL will use a mix of statement-based and row-based logging in the binlog. </li> </ul> I only mention the modes of replication, because it is mentioned in the doc you referenced that Row-based is the recommended option if you are using GTIDs. There is another option called Master-Master replication, where you can write to two masters (each acting as a slave for the other), but this requires a special configuration to ensure that the data written to each master is unique. It is much trickier to manage than a typical Master/Slave setup. Therefore, the prevention of writes to a Slave is something that you must ensure from your application for a typical replication process to function correctly. It is fine to read from a Slave, but you should not write to it. Note that the Slave can be behind the Master if you are using it for reads, so it is best to perform queries for things that can be behind the Master (like reports that are not critical up to the second or millisecond). You can ensure no writes to the Slave by making your common application user a read-only user for the Slave server, and a read-write user for the Master.

Why use GTIDs in MySQL replication?

Tags:

sql

mysql

replication

mysql-5.7

When it comes to database replication, what is the use of global transaction identifiers? Why do we need it to prevent concurrency across the servers? How is that prevention achieved exactly?

I tried to read the documentation at http://dev.mysql.com/doc/refman/5.7/en/replication-gtids.html but still could not understand it clearly. This may sound very basic but I would really appreciate it if someone could explain the concepts to me.

663

asked May 06 '14 04:05

Chong Lip Phang

2 Answers

The reason for the Global Transaction ID is to allow a MySQL slave to know if it has applied a given transaction or not, to keep things in sync between Master and Slave. It can also be used for restarting a slave if a connection goes down, again to know the point in time. Without using GTIDs, replication must be controlled based on the position in a given binary transaction log file (bin log). This is much harder to manage than the GTID method.

A master is the only server that is typically written to, so that slaves merely rebuild a copy of the master by applying each transaction in sequence.

It is also important to understand that MySQL replication can run in one of 3 modes:

Statement-based: Each SQL statement is logged to the binlog and replicated as a statement to the slave. This can be in some cases ambiguous at the slave causing the data to not match exactly. (Most of the time it is fine for common uses).
Row-based: In this mode MySQL replicates the actual data changes to each table, with a "before" and "after" picture of each row, which is fully accurate. This can result in a much larger binlog, for example if you have a bulk update query, like: UPDATE t1 SET c1 = 'a' WHERE c2 = 'b'.
Mixed: In this mode, MySQL will use a mix of statement-based and row-based logging in the binlog.

I only mention the modes of replication, because it is mentioned in the doc you referenced that Row-based is the recommended option if you are using GTIDs.

There is another option called Master-Master replication, where you can write to two masters (each acting as a slave for the other), but this requires a special configuration to ensure that the data written to each master is unique. It is much trickier to manage than a typical Master/Slave setup.

Therefore, the prevention of writes to a Slave is something that you must ensure from your application for a typical replication process to function correctly. It is fine to read from a Slave, but you should not write to it. Note that the Slave can be behind the Master if you are using it for reads, so it is best to perform queries for things that can be behind the Master (like reports that are not critical up to the second or millisecond). You can ensure no writes to the Slave by making your common application user a read-only user for the Slave server, and a read-write user for the Master.

answered Sep 24 '22 16:09

dbschwartz

Why do we need to prevent concurrency across the servers?

If I understood the question correctly, you are talking about consistency. If so, the answer is that you need keep a consistent state in a distributed system. For example, if my bank account information is replicated throughout several different servers it is fundamental that they have exactly the same € balance. Now imagine that I perform multiple money transactions (deposits/spendings) and at each one I was connected to a different server: concurrency problems would cause my account balance to be different at each server, which is unacceptable.

How is that prevention achieved exactly?

Using a master/slave approach. Amongst the servers, you have one server (the master) that is responsible for handling every writing operation, meaning that modifications to the database must be handled only by this server. The database of this master server is replicated to all other servers (the slaves), which are not allowed to modify the database but can be used to read the database (e.g. SELECT operations). Knowing that there is only one server allowed to modify the database, you do not have consistency issues.

what is the use of global transaction identifiers?

Communication between servers is asynchronous and a slave server is not required to be connected with the master at all times. Therefore, once a slave server reconnects with the master server, it may find that the master's database has been modified in the meanwhile, thus it must update its own database. The problem now is knowing amongst all modifications performed by the master server, which are the ones that the slave server already performed in a previous date and which are the ones that were not performed yet.

GTIDs address this issue: they uniquely identify each transaction performed by the master server. Now, the slave server can identify amongst all the transactions performed by the master server, which are the ones that were not seen before.

answered Sep 22 '22 16:09

João Matos

Related questions
                            
                                Select ONLY one row per category
                            
                                MySQL compare unix timestamps
                            
                                Should I use a JOIN function or run several queries in a loop structure?
                            
                                MySQLI Inner Join 2 Tables
                            
                                Import folder of .sql files into a single database
                            
                                How to change UserDoesNotExist SELECT behavior in Flask-peewee - python & mysql
                            
                                CREATE INDEX MySQL 5.6.13 On Production Database
                            
                                PHP PDO mySQL query returns column names and integers
                            
                                MySQL Update query with multiple values
                            
                                MySQL query with complex sorting
                            
                                How does (My)SQL evaluate chained LEFT/INNER JOINs?
                            
                                Difference between = and IS mysql
                            
                                How do I resolve a 'Resource is not set' error in magento?
                            
                                Delete record while inserting value to database
                            
                                SELECT FOR UPDATE holding entire table in MySQL rather than row by row
                            
                                Atomic MySQL transactions in Anorm
                            
                                Replace NOT IN with LEFT JOIN in SQL statement
                            
                                Magento Mass Import images with MAGMI - images excluded
                            
                                How can i know if percona is installed
                            
                                Insert into table when one value not exist in another table?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With