Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Executing a command on a remote server with decoupling, redundancy, and asynchronous

I have a few servers that require executing commands on other servers. For example a Bitbucket Server post receive hook executing a git pull on another server. Another example is the CI server pulling a new docker image and restarting an instance on another server.

I would normally use ssh for this, creating a user/group specifically for the job with limited permission.

A few downsides with ssh:

  • Synchronous ssh call means a git push will have to wait until complete.
  • If a host is not contactable for whatever reason, the ssh command will fail.
  • Maintaining keys, users, and sudoers permissions can become unwieldy.

Few possibilities:

  • Find an open source out of the box solution (I have tried with no luck so far)
  • Set up an REST API on each server that accepts calls with some type of authentication, e.g. POST https://server/git/pull/?apikey=a1b2c3
  • Set up Python/Celery to execute tasks on a different queue for each host. This means a celery worker on each server that can execute commands and possibly a service that accepts REST API calls, converting them to Celery tasks.

Is there a nice solution to this problem?

like image 510
gak Avatar asked Nov 09 '22 01:11

gak


1 Answers

Defining the problem

  1. You want to be able to trigger a remote task without waiting for it to complete.

This can be achieved in any number of ways, including with SSH. You can execute a remote command without waiting for it to complete by closing or redirecting all I/O streams, e.g. like this:

ssh user@host "/usr/bin/foobar </dev/null >/dev/null 2>&1"
  1. You want to be able to defer the task if the host is currently unavailable.

This requires a queuing/retry system of some kind. You will also need to decide whether the target hosts will be querying for messages ("pull") or whether messages will be sent to the target hosts from elsewhere ("push").

  1. You want to simplify access control as much as possible.

There's no way to completely avoid this issue. One solution would be to put most of the authentication logic in a centralized task server. This splits the problem into two parts: configuring access rights in the task server, and configuring authentication between the task server and the target hosts.

Example solutions

  • Hosts attempt to start tasks over SSH using method above for asynchrony. If host is unavailable, task is written to local file. Cron job periodically retries sending failed tasks. Access control via SSH keys.

  • Hosts add tasks by writing commands to files on an SFTP server. Cron job on target hosts periodically checks for new commands and executes them if found. Access control managed via SSH keys on the SFTP server.

  • Hosts post tasks to REST API which adds them to queue. Celery daemon on each target host consumes from queue and executes tasks. Access managed primarily by credentials sent to the task queuing server.

  • Hosts post tasks to API which adds tasks to queue. Task consumer nodes pull tasks off the queue and send requests to API on target hosts. Authentication managed by cryptographic signature of sender appended to request, verified by task server on target host.

You can also look into tools that do some or all of the required functions out of the box. For example, some Google searching came up with Rundeck which seems to have some job scheduling capabilities and a REST API. You should also consider whether you can leverage any existing automated deployment or management tools already present in your system.

Conclusions

Ultimately, there's no single right answer to this question. It really depends on your particular needs. Ask yourself: How much time and effort do you want to spend creating this system? What about maintenance? How reliable does it need to be? How much does it need to scale? And so on, ad infinitum...

like image 72
augurar Avatar answered Nov 14 '22 22:11

augurar