Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

free secure distributed make system for linux [closed]

Are there any good language-agnostic distributed make systems for linux that are secure and free?

Background Information:

I run scientific experiments (computer-science ones) that sometimes have large dependency trees, occasionally on the order of thousands or tens of thousands of tree nodes. This dependency tree is over the data files, data processing executables, and results files.

I've experimented with various techniques over the years including:

  1. Rolling my own dependency tracker using a database and running a script on each worker machine. This can get a bit cumbersome, especially when trying to work with non-scripting languages.
  2. Putting all the processing commands in single makefile, with pseudo-targets that can be manually "built" on different worker machines. This requires no special tools, but it can be a pain to manually break up the work into evenly-sized pseudo-target chunks and correctly invoking "make" on each worker box.
  3. distmake: automatically distribute the execution of commands from a single makefile...

I'm basically looking for something like distmake, but more secure. As far as I can tell, distmake essentially leaves a wide-open backdoor into each worker node.

It would also be nice if a replacement were more robust than distmake. If you break out of the main distmake call, it can shut down the backdoor servers, but it doesn't properly kill the executing processes on the worker nodes.


Clarifications:

I am processing data with the makefile, not compiling and linking with gcc. From what I read in the documentation, distcc is a specialized tool for distributing gcc. I'll be running my own executables on very large data files hosted on a shared filesystem, not gcc on source files, so distcc isn't helpful.

The worker nodes are externally-visible machines, so I want any worker daemons to be at least as secure as ssh. As best I can tell without reading the source, distmake worker daemons open up a port and will accept commands from anyone who attaches to it. They will execute the commands as the user who started the daemon.

like image 785
Mr Fooz Avatar asked Dec 30 '08 02:12

Mr Fooz


1 Answers

Dependencies are hard to manage, and I don't know of any perfect system that does what you want without a significant amount of work.

The closest thing that I've used is the following setup: - a Condor queue to manage the machines in your cluster - the Condor DAGMAN meta-scheduler to submit jobs that are interdependent. DAGMAN is an acronym for Directed Acyclic Graph MANager, in which a directed acyclic graph is used to represent the dependencies between your jobs.

We've done this for an iterative scientific protocol in our lab very successfully and it's worked great, although it was a learning experience for a very talented postdoc to get the initial implementation running. It does require that you set up and run a Condor cluster which is non-trivial, but I assume you have either Condor or something similar to manage all of your machines. It might be that Sun GridEngine has something analogous that I don't know about.

like image 192
James Thompson Avatar answered Sep 23 '22 06:09

James Thompson