The IT department anti-pattern

Running a reliable production IT system should not be exciting, it should be a predictable matter. But Designing them and engineering right solution should be and exciting time. Lets put the adventure back in the right place.

Undoubtedly, one of the most repeated mantras in the industry, and in our opinion, one of the most harmful ones, is that that says ‘if it works, do not fix it‘. It has reached almost the status of corollary, something so deeply rooted in IT that is almost assumed as a truth. You can read it in specialized forums, in discussions between enthusiasts, but is also a common theme in engineering meetings rooms in big corporations, sometimes even in technological ones. Any new company or newly formed IT department of project is subject of following for this; they work hard to achieve that initial state and once it works, it stays there. Good enough to run the show. Day one of forever legacy.

This is just the first step into an spiral of bad decisions, where normally the next one is to invest in tools before investing in a design. Sure, it is much easier and faster to convince someone to buy a magical tool, or framework, or service that promises to sort an specific problem than to assign people into describing the problem and working into a re-design that solves it. So, once the first limitations to the system start to hit, in panic, rushed decisions are made to palliate the situation. Throw money at the problem, on top of the existing system that is not going to be touched, because it works. But nobody can any longer tell completely why it works or how. Or what would happen if a part of the system would change or disappear.

At the next level, it arrives the lethargic state of ‘ we are too busy to improve’. And this state is where most organizations stay at for way too long, and and would call such an state of an IT department normality. It basically consists in having to manage a system that was never properly designed, that grew organically to be good enough to run the show and was patched up by adding expensive tooling on top.

It tends to look like an amalgam of tools, services, portals, monitoring solutions, home brewed code, etc. each one of them normally intimately linked to a person that mastered it at some point, and with overlapping functionality. Each one contains information that can be found in other tools, only that partially, and not in the same format. And each one representing huge technical debt. Not only that, personnel will be limited to owning a system and entrench there. Over time they will specialize in that one tool, become indispensable. See the ‘ MS Exchange guy’, the ‘UNIX guru’, the ‘SQL wizard’, the ‘coding monkey’ and the ‘Mac man’. And the trench war is set. Everything is someone else fault.

Every rushed solution, every hack, every new purchased tool to solve a single issue that was added on top of the existing ones are technical debt that will need to be supported and will be difficult to decommission.

And that is it. The show runs in a system that grew spoiled by cash, from a few systems put together into a complex monster. Probably some of the original people who are responsible for it are still running it, and some of them will be even proud of it, of course no parent would admit that they have an ugly child. Every rushed solution, every hack, every new purchased tool to solve a single issue that was added on top of the existing ones are technical debt that will need to be supported and will be difficult to decommission. It will still have to be managed and have personnel assigned to it. After some time, entire teams are not able to tell why they use a certain tool or follow a certain process as they just do it because they were told to do so by someone else that did not have the complete answer either, or maybe it was part of a handover. Have you ever hear of the monkey and the ladder experiment?

And here it comes the joke: the IT department, you see, should be the perfect example inside any organization of good design, efficiency and structure. Instead, the IT department of any organization you can randomly pick, would be most likely in that third state I described before. Too busy managing a messy environment to even consider improving it.

This environment will be producing bottlenecks from the very beginning, procedures and staff that will be indispensable for any activity in the organization and that will be, most times inadvertently, a constrain to the same organization they serve. In most cases these constrains will become also dependent of their job, making themselves unemployable in any other organization, and becoming addicted to fire-fighting and being every day the hero of the day. The most talented person in that organization would end setting its pace to the rest of the organization, as he will be used in every important project, task or incident. This is very well illustrated by Gene Kim in “The Phoenix Project”novel:

How we can fix this? Well, organically it is not possible. It is very unlikely a badly design system can be fixed from inside in a relatively short period of time or without dedicating lots of resources to it. This approach faces some basic impediments:

Daily operations are in the way. It is very difficult to fix something that is in motion.
There is internal resistance, mostly from the human constrains.
There is no strong support from management. Why are you fixing something that works?
It will need to support legacy systems and code for a long time.
It will give the impression that the change is costing lots of money.
It will not produce immediate results.

The strongest piece of resistance is the time. The argument is that ‘ we do not have time to invest in this improvement’. There is time however to implement the same fix over and over and never resolve the root cause.

What would be instead a good approach to make that change easy? Do what clever companies have done to successfully accommodate change and disruptive technology: Spin off a new department. When a disruptive technology appears and large organizations can not include it in their already established production lines, they spin off a new company to do so. This allows for the parent company to continue their business as usual without being interrupted by the new structure. When ready and matured, the new technology is phased in and tested, and can eventually take over the market once held by the previous.

Lets say you take a 10% of the IT budget and 10% of the personnel, and for that you choose those that are interested in rewriting the department’s story into a brighter one. And have them to talk, and draw and design. And create a master document containing a vision.

What is our mission?
How we know we are fulfilling the mission correctly, what are the indicators?
What is the minimum viable system we need to fulfill the mission?
How will we handle growth?
How are we testing that our system is correct?
What should be the system life cycle?

Do not ignore good practice frameworks like ITIL. Do not expect DevOps methodologies alone to save the day. Combine them and use devops methodology to implement ITIL practices. Then design the system around that perfect vision.

Have a CMDB
Have a Knowledge Base
Create a strong first line support team
Define the Infrastructure as Code
Fully automate OS deployments
Package all software and scripts
Use strong naming conventions
Have strict coding rules
Have a Configuration management system integrated with the CMDB
Strive to have a data driven infrastructure
Create a process that re-evaluates the system

And the alignment of the ITIL framework and Devops methodology aligns.

Anybody who owns a business function in the organization should be responsible to manage its configuration through simple data changes

The normal IT department today should require a minimum of human intervention for changes to production. From development to staging, there should be a seamless, straight line. Projects are born from a dialogue with business analysts, developers, architects, sysadmins and support, all at once. Anybody who owns a business function in the organization should be responsible to manage its configuration through simple data changes. Any change in application code, infrastructure code or network code should be easy to follow by anyone in the department. New systems should be spun at the touch of a button, and easy to manage through data entry, given the appropriate permissions.

Once the system is in place, start to migrate the less critical system, test and refine. And keep doing so until all legacy is dealt with. The trick is to not use new infrastructure to manage legacy systems, and to work hard to engineer a complex system that allows changes to happen in an easy way.

At Netdevops our core values are in making change easy for IT departments, by bringing our expertise in automation, configuration management, ITIL, software development, continuous integration and system administration to your organization. We work hard to make it easy for your IT department to be a change-friendly place.

Running a reliable production IT system should not be exciting, it should be a predictable matter. But Designing them and engineering right solution should be and exciting time. Lets put the adventure back in the right place.