Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Condor, Sun Grid Engine, or something else?

I'm trying to work out whether we should try out Condor or Sun Grid Engine at work (or possibly something else).

We often have lots of unused WinXp workstations. The hope is that we could use wake-on-LAN, run all our jobs, and then shut down automatically. We'd mainly be running Matlab, Java or Python simulations for either monte-carlo or parameter explorations.

With my limited knowledge of Condor, it sounds like using a the vm universe might be a convenient way of taking care of snapshots without having to modify existing code.

Is SGE or something else better than condor for this kind of work?

like image 588
Pengin Avatar asked Dec 06 '09 11:12

Pengin


3 Answers

SGE doesn't really support windows. It comes with all kinds of caveats and missing bits on Windows.

I've been running Condor pools for many years now and it is a superb HTPC setup for both cycle-stealing and dedicated, always-on hardware, on Linux and Windows machines. The recent addition of their Rooster daemon lets you put machines to sleep between job cycles and wake them up when new work appears in the pool. They also have an active and very helpful support community. Checkpointing is the only Condor feature not available on Windows. Everything else is there. With the addition of the VM Universe, checkpointing is getting less and less useful. Really: to use checkpointing successfully you need to be able to relink your entire code stack. So if you're running Matlab jobs, even on Linux, checkpointing isn't going to be possible.

If you have specific questions about getting Condor running on Windows I'd be happy to answer them, share my experiences with it. I run Condor across 4 pools around the globe with a total of about 1500 dedicated machines in all the pools and some 1000 or so additional desktop machines that are available as users care to donate them.

like image 167
Ian C. Avatar answered Oct 06 '22 14:10

Ian C.


I'd start with Condor. It has good support for Windows, and newer versions have built-in support for sending wake-on-lan in a very configurable way when jobs can run on certain machines. It can also shut the machines down based on user-defined policies.

like image 45
Greg Avatar answered Oct 06 '22 15:10

Greg


After Oracle's takeover of SGE (Sun Grid Engine), there is the Open Grid Scheduler project that still offers open-source Grid Engine.

http://gridscheduler.sourceforge.net/

like image 39
Sun Grid Engine Avatar answered Oct 06 '22 15:10

Sun Grid Engine