Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any method to get mutual exclusion in a chef node?

For example, If a process updates a node when a chef-client is running the chef-client will overwrite the node data:

  1. chef-client gets node data (state 1)
  2. The process A gets node data (state 1)
  3. The process A updates locally the node data (state 2)
  4. This process saves node data (state 2)
  5. chef-client updates locally the node data (state 2*)
  6. chef-client saves node data, and this node data does not contains the changes from the process A (state 2). The chef-client overwrite the node data. (state 2*)

The same problem occurs, if we have two processes saving node data in the same moment

EDIT

We need to external modification because we have a nice UI of Chef server to manage remotely a lot of computers, showing like a tree (similar to LDAP). An administrator can update the value of the recipes from here. This project is OpenSource: https://github.com/gecos-team/

Although we had a semaphore system, we have detected that if we have two or more simultaneous requests, we can have a concurrence problem:

  1. The regular case is that the system works
  2. But sometimes the system does not work

EDIT 2

I have added a document with a lot of information about our problem.

like image 716
Goin Avatar asked Dec 25 '22 12:12

Goin


2 Answers

Throwing what I would do for this case as an answer:

  1. Have a distributed lock mechanism like This I'm not using it myself, it is just for the idea
  2. Build a start/report/error handler which will
    • at start acquire a lock on the node name in the DLM in 1.
      • if it can't abort the run or wait untill the lock is free
    • at end (report or error) release the lock.
  3. Modify the External system to do the same as the handler above, aquire a lock before modifying and release when done.
  4. Pay attention to the lock lifetime !!! It should be longer than your Chef Run plus a margin, and the UI should ensure its lock is still there before writing and abort if not.

A way to get rid of the handler (but you still need a lock for the UI) is to take advantage of the reporting api (premium feature of chef 12, free under 25 nodes, license needed upward)

This turn a bit convoluted and need the node to do reporting (so the chef-server url should end with organizations/ and the client version should be above 11.16 or use the backport)

Then your can ask about the runs for a node and check if there's one at started status for this node, and wait until it is ended.

like image 102
Tensibai Avatar answered Feb 05 '23 17:02

Tensibai


Chef doesn't implement a transaction feature and also it does not re-converge nodes on updates automatically by default. It's open for race conditions which you can try to reduce by updated node attributes from within a chef-client run (right before you do something critical) but you will never end up in a reliable, working setup.

The longer the converge runs, the higher the gap and risk of corruption.

Chef's node attributes are only useful for debugging or modification by the chef-client running on the node itself and pretty much useless in highly concurrent/dynamic environments.

I would use Consul.io to coordinate semaphores and key/value configuration data in realtime. Access it using chef recipes or LWRPs using one of the various interfaces consul provides (http, DNS, …).

You can implement a very easy push-job task to run chef-client (IMHO easier and more powerful than the chef "push jobs" feature, however not integrated in Chefs' ACL/user management) which also is guarded by a distributed semaphore or using the "Leader Election" feature. Of course you'll have to add this logic to your node update script, too.

Chef-client will then retrieve a lock on start and block you from manipulating data while it converges and vice versa.

like image 30
Roland Avatar answered Feb 05 '23 17:02

Roland