Is it possible to recover from a network partition in an mnesia cluster without restarting any of the nodes involved? If so, how does one go about it? I'm interested specifically in knowing: <ul> <li>How this can be done with the standard OTP mnesia (v4.4.7)</li> <li>What custom code if any one needs to write to make this happen (e.g. subscribe to mnesia running_paritioned_network events, determine a new master, merge records from non-master to master, force load table from the new master, clear running parititioned network event -- example code would be greatly appreciated).</li> <li>Or, that mnesia categorically does not support online recovery and requires that the node(s) that are part of the non-master partition be restarted.</li> </ul> While I appreciate the pointers to general distributed systems theory, in this question I am interested in erlang/OTP mnesia only.

After some experimentation I've discovered the following: <ul> <li>Mnesia considered the network to be partitioned if between two nodes there is a node disconnect and a reconnect without an mnesia restart. </li> <li>This is true even if no Mnesia read/write operations occur during the time of the disconnection.</li> <li>Mnesia itself must be restarted in order to clear the partitioned network event - you cannot <code>force_load_table</code> after the network is partitioned.</li> <li>Only Mnesia needs to be restarted in order to clear the network partitioned event. You don't need to restart the entire node. </li> <li>Mnesia resolves the network partitioning by having the newly restarted Mnesia node overwrite its table data with data from another Mnesia node (the startup table load algorithm).</li> <li>Generally nodes will copy tables from the node that's been up the longest (this was the behaviour I saw, I haven't verified that this explicitly coded for and not a side-effect of something else). If you disconnect a node from a cluster, make writes in both partitions (the disconnected node and its old peers), shutdown all nodes and start them all back up again starting the disconnected node first, the disconnected node will be considered the master and its data will overwrite all the other nodes. There is no table comparison/checksumming/quorum behaviour.</li> </ul> So to answer my question, one can perform semi online recovery by executing <code>mnesia:stop(), mnesia:start()</code> on the nodes in the partition whose data you decide to discard (which I'll call the losing partition). Executing the <code>mnesia:start()</code> call will cause the node to contact the nodes on the other side of the partition. If you have more than one node in the losing partition, you may want to set the master nodes for table loading to nodes in the winning partition - otherwise I think there is a chance it will load tables from another node in the losing partition and thus return to the partitioned network state. Unfortunately mnesia provides no support for merging/reconciling table contents during the startup table load phase, nor does it provide for going back into the table load phase once started. A merge phase would be suitable for ejabberd in particular as the node would still have user connections and thus know which user records it owns/should be the most up-to-date for (assuming one user conneciton per cluster). If a merge phase existed, the node could filter userdata tables, save all records for connected users, load tables as per usual and then write the saved records back to the mnesia cluster.

Online mnesia recovery from network partition [closed]

Tags:

erlang

mnesia

Is it possible to recover from a network partition in an mnesia cluster without restarting any of the nodes involved? If so, how does one go about it?

I'm interested specifically in knowing:

How this can be done with the standard OTP mnesia (v4.4.7)
What custom code if any one needs to write to make this happen (e.g. subscribe to mnesia running_paritioned_network events, determine a new master, merge records from non-master to master, force load table from the new master, clear running parititioned network event -- example code would be greatly appreciated).
Or, that mnesia categorically does not support online recovery and requires that the node(s) that are part of the non-master partition be restarted.

While I appreciate the pointers to general distributed systems theory, in this question I am interested in erlang/OTP mnesia only.

847

asked Mar 08 '09 23:03

archaelus

1 Answers

After some experimentation I've discovered the following:

Mnesia considered the network to be partitioned if between two nodes there is a node disconnect and a reconnect without an mnesia restart.
This is true even if no Mnesia read/write operations occur during the time of the disconnection.
Mnesia itself must be restarted in order to clear the partitioned network event - you cannot force_load_table after the network is partitioned.
Only Mnesia needs to be restarted in order to clear the network partitioned event. You don't need to restart the entire node.
Mnesia resolves the network partitioning by having the newly restarted Mnesia node overwrite its table data with data from another Mnesia node (the startup table load algorithm).
Generally nodes will copy tables from the node that's been up the longest (this was the behaviour I saw, I haven't verified that this explicitly coded for and not a side-effect of something else). If you disconnect a node from a cluster, make writes in both partitions (the disconnected node and its old peers), shutdown all nodes and start them all back up again starting the disconnected node first, the disconnected node will be considered the master and its data will overwrite all the other nodes. There is no table comparison/checksumming/quorum behaviour.

So to answer my question, one can perform semi online recovery by executing mnesia:stop(), mnesia:start() on the nodes in the partition whose data you decide to discard (which I'll call the losing partition). Executing the mnesia:start() call will cause the node to contact the nodes on the other side of the partition. If you have more than one node in the losing partition, you may want to set the master nodes for table loading to nodes in the winning partition - otherwise I think there is a chance it will load tables from another node in the losing partition and thus return to the partitioned network state.

Unfortunately mnesia provides no support for merging/reconciling table contents during the startup table load phase, nor does it provide for going back into the table load phase once started.

A merge phase would be suitable for ejabberd in particular as the node would still have user connections and thus know which user records it owns/should be the most up-to-date for (assuming one user conneciton per cluster). If a merge phase existed, the node could filter userdata tables, save all records for connected users, load tables as per usual and then write the saved records back to the mnesia cluster.

132

answered Sep 17 '22 22:09

archaelus

Related questions
                            
                                Who are Bogdan and Björn in "Bogdan/Björn's Erlang Abstract Machine"? (BEAM) [closed]
                            
                                What OS threads get used in Erlang’s abstract machine, BEAM?
                            
                                Introduction to Erlang/OTP production applications deployment
                            
                                Writing GUI in one language and main app in another
                            
                                Easy way of loading project's Rebar dependencies in Erlang shell
                            
                                Concurrency: Processes vs Threads
                            
                                Erlang (or elixir) performance (requests per second) is slow vs jruby?
                            
                                Best way to split several heads from a list with Erlang?
                            
                                How does Erlang schedule work for multicore CPU machines?
                            
                                Erlang: How to write my outputs in a text file?
                            
                                Erlang/OTP: Synchronous vs. Asynchronous messaging
                            
                                What is the best programming language to implement neural networks?
                            
                                Erlang lists:index_of function?
                            
                                How can I write an exception stack trace in erlang after catching it?
                            
                                Query an Erlang process for its state?
                            
                                What is the most mature JSON library for Erlang?
                            
                                Can I disable printing lists of small integers as strings in Erlang shell?
                            
                                What does the "head mismatch" compiler error mean?
                            
                                Remove duplicate elements from a list in Erlang
                            
                                How does one avoid creating an ad-hoc type system in dynamically typed languages?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Online mnesia recovery from network partition [closed]

Tags:

erlang

mnesia

archaelus

People also ask

1 Answers

archaelus

Recent Activity

Donate For Us