I am not an IT admin, I'm a software developer (microsoft stack) and I'm trying to understand what is wrong with the IT environment of one of our customers.

We have deployed our client\server solution to a medium sized business. The problem is, the customers IT environment (mostly various types of Microsoft servers - sql servers, SharePoint, lync, IIS servers, etc, etc) appears to be extremely chaotic and flakey. There constantly seems to be one system or another failing due to an admin having reconfigured something on a server that affects our software running on it. It is taking up lots of support time to keep going in and find an admin has changed some setting on a server that affects our solution, rather than anything directly to do with our software.

It not just our software either, it seems to be going on across all their systems and the admins seem to be constantly firefighting. No sooner are all the dominoes standing than someone changes something that knocks 1 down again...

I going to have a chat with their IT Manager but I'm not hugely knowledgeable about IT Admin practices.

What needs to be looked at or questioned? In the IT Admin world is there any kind of best practice or process that can address this? Other suggestions?

  • 107
  • 1
  • 6
  • 4
    `How to debug and prevent flaky unreliable IT environment?` - Hire competent system admins, give them sufficient budget, and follow their directions within reason. – Zoredache May 17 '13 at 17:38
  • 1
    This post is fairly argumentative. – gWaldo May 17 '13 at 17:41
  • 1
    The paint color on my neighbors house clashes with mine. Every time I repaint my house to match his he repaints his and I have to repaint mine all over again. I'm planning on talking to him about it but I'm not an interior/exterior designer. What is the best practice regarding paint color? – joeqwerty May 17 '13 at 17:42
  • How do I tell an incompetent manager how to manage his incompetent people? – mfinni May 17 '13 at 17:46

3 Answers3


Generally reliability in IT is provided by a few different practices, namely:

  • Access control
  • Change management
  • Configuration management
  • Revision control
  • Secret Sauce

Access control is simply limiting who can make changes to critical/production systems. Change management is generally handled through access control and via a ticketing system. Requests must be approved by someone higher up before the change can be made. Configuration management is ensuring the consistency of systems by using an external tool to tightly control all of their configuration parameters. This is generally achieved by Group Policy or other tools like Puppet/Chef/etc. Revision control provides a history of the configuration.

The Secret Sauce is an IT team that knows what the hell it's doing. All of the process and protocols in the world can't make up for bad judgment and inexperienced/untalented engineers.

Joel E Salas
  • 5,572
  • 16
  • 25

The best process to follow would be "hire someone competent" imho. If a sysadmin team is constantly firefighting and making no efforts to structurally improve their environment, I would consider them not fit for the job.

Dennis Kaarsemaker
  • 19,277
  • 2
  • 44
  • 70

You hire non-flaky, reliable staff. There is no other way.

Sounds like they have an incompetent admin. The admin may simply be overworked, not testing, and making mistakes along the way.

You could buy their admin The Practice of System and Network Administration. Or they could hire someone who already knows this stuff.

  • 1,181
  • 3
  • 13
  • 30
  • 36,144
  • 4
  • 53
  • 86