Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The Neglected Stakeholder a.k.a the System Administrator

Some time ago I came to realize that almost every customer project that I have been working on so far has neglected an important group of stakeholders: the system administrators.

These silent heroes are usually only involved at the end of a project and are left with an executable black box of bits that they have to install, support, and maintain for years to come. Whenever an issue occurs with this black box they have to find a way to resolve it using whatever random piece of information and tool support made available to them by the black box or the underlying platform, and if this is not sufficient then they have to improvise.

If they had been involved as a stakeholder in the project from the beginning they would have had a chance to predict potential problems and inform the project team about it. But reality is different and even though I as a developer would love to involve the system administrator as an extra stakeholder, external factors may prevent this from happening.

In these situations I would like to help our silent heroes as good as I can. So my question is:

What would a system administrator wish from us developers when we develop the systems they will have to maintain?

If you are a system administrator please tell a war story about a difficult problem you once had and what developers could have done to make it easier for you to solve it.

like image 519
Jonas Kongslund Avatar asked Nov 21 '08 00:11

Jonas Kongslund


2 Answers

Various things, including (but unlikely to be limited to) these, which are not in priority order:

  • No requirement to use privileged install
  • Option to use privileged install
  • Option for distributed install (so it can be installed on a server and used on other machines)
  • Clean uninstall
  • Sensible upgrade patterns
  • Option to choose install location
  • Minimal dependencies on other software
  • Minimal scattering of data around the system (don't dump stuff in /etc, /usr/lib, /var/adm, ...)
  • No ever-growing logs
  • Silent install
  • Scripted install
  • Online documentation (on machine - as well as on internet)
  • Man pages perhaps
  • Easy to configure
  • Easy to make accessible to end users
  • No security risks
  • No special users or groups (or limited number - at most one special user, one special group is a target, though not always attainable)
  • Either no 'phone home' functionality or only if explicitly configured (must not be default)
  • Good logging of diagnostics when there is a problem
  • Good tech support available if there is a problem
  • No requirement to get activation code during install
  • No requirement to reboot the machine after an install
  • Ability to parallel run old and new versions

A lot depends on what the software is and how it is used. The requirements for a GUI program that works on Windows, Linux and MacOS X are radically different from the requirements for a network daemon - but the goal should still be stable, reliable, easily managed software.

Bear in mind that there are big differences between software prepared by an in-house department for use within one company and software prepared for use by customers external to the company that develops the software.

like image 64
Jonathan Leffler Avatar answered Oct 21 '22 08:10

Jonathan Leffler


When a problem inevitably occurs, pay attention to what the sysadmin says and believe him. Don't just dismiss it out of hand if it doesn't fit with your initial assessment.

War story: Back about 6 years ago, I was sysadminning for a smallish manufacturing company and they decided to buy some software to handle scheduling of preventive maintenance on their equipment. One of its features was importing maintenance requests from email, but we had occasional problems with errors talking to the mail server during this process and I was eventually called in to take a look at it during a phone call with the developer. The conversation involved multiple iterations of

Developer: I've never heard of anyone having that kind of trouble talking to the mail server. It has to be a firewall issue.

Me: I'm logged into the firewall, running a packet sniffer, and watching your app's traffic pass through without any problems. It's getting through the firewall just fine.

Developer: No, no - it has to be a firewall issue.

(In the end, it turned out that the problem was that the app opened a POP3 connection, read all the mail, waited for the user to schedule the tasks, then sent a POP command to delete the mail after all requests had been scheduled. If the user took more than 15 minutes to do the scheduling, the POP connection timed out and the app wasn't able to recover, so it died instead. And then the user had to repeat the scheduling, meaning it would probably take long enough to time out again...)

like image 21
Dave Sherohman Avatar answered Oct 21 '22 08:10

Dave Sherohman