Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data consistency in XA transactions

Tags:

Suppose we have a database (e.g. Oracle) and a JMS provider (e.g. HornetQ) participating in an XA transaction. A message is sent to a JMS queue and some data are persisted in the database in the same distributed transaction. After the transaction is committed, a message consumer will read the persisted data and process them in a separate transaction.

Regarding the first XA transaction, the following sequence of events may be executed by the transaction manager (e.g. JBoss)

  1. prepare (HornetQ)
  2. prepare (Oracle)
  3. commit (HornetQ)
  4. commit (Oracle)

What happens if the message consumer starts reading the data after commit is completed in HornetQ, but is still being executed in Oracle? Will the message consumer read stale data?

The question can be generalized to any kind of multiple resources participating in XA transactions, i.e. is there a possibility for a small time window (when commit phases are executed) in which a reader from another concurrent transaction can get an inconsistent state (by reading committed data from one resource and stale data from another one)?

I would say that the only way for transactional resources to prevent this is to block all readers of affected data once the prepare phase is completed until the commit is issued. This way the example message consumer mentioned above would block until data is committed in the database.

like image 580
Dragan Bozanovic Avatar asked Jun 08 '16 18:06

Dragan Bozanovic


People also ask

What is consistency in database transaction?

Consistency in database systems refers to the requirement that any given database transaction must change affected data only in allowed ways. For a database to be consistent, data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, or any combination.

What is data consistency?

Data consistency means that each user sees a consistent view of the data, including visible changes made by the user's own transactions and transactions of other users.

What is data consistency with example?

So, for example, a programmer can dictate that only two nodes need to read the newly input data before it acknowledges data consistency. Once it crosses that barometer, it will be considered consistent data thereafter.

What is transaction management consistency?

Consistency. Data is in a consistent state when a transaction starts and when it ends. For example, in an application that transfers funds from one account to another, the consistency property ensures that the total value of funds in both the accounts is the same at the start and end of each transaction.


2 Answers

Unfortunately XA transactions don't support consistency. When mapped to CAP theorem XA solves Availability and Partition tolerence across multiple datastores. In doing so it must sacrifice on Consistency. When using XA you have to embrace eventual consistency.

In any event creating systems that are CP or AP is hard enough that regardless of your datastore or transactional model you will face this problem.

like image 131
Justin Avatar answered Sep 28 '22 07:09

Justin


I have a some experience with a bit of different environment based on Weblogic JMS and Oracle 11g. In this answer I suppose that it is working exactly the same. I hope my answer will help you.

In our case there was a "distant" system which was obligatory to notify based on the different events happend inside the local system. The other system also red into our database so the use-case seems almost identical to your problem. The sequence of the events was exacly the same as yours. On the test systems there was not a single faulire. Everyone thought that it will work but some of us doubted if it is the correct solution. As the software hit production some of the BPM processes run unpredictably. So a simple answer to your question: yes it is possible and everyone should be aware it.

Our solution (in my opinion) was not a well planned one, but we recognised that the little time window between the two commit is braking the system, so we added some "delay" to the queue (if I remember it was like 1-2 minutes). It was enough to finish the other commit and read consistent data. In my point of view it is not the best solution. It is not solving the syncronisation problem (what if an oracle transaction is longer than 1-2mins?).

Here is a great blog post that is worth to read and the last solution seems the best to me. We implemented it in an other system and it is working way better. Important to notice that you should limit the retries (re-reads) to prevent "stuck" threads. (With some error reporting.) With this restrictions I was not able to find better solution so far, so if anyone got some better option I am looking forward to hear it. :)

Edit: typos.

like image 20
Hash Avatar answered Sep 28 '22 06:09

Hash