Skip to main content

The HiSSS of Infrastructure - Part 2

In the first installment of this series, I outlined my Infrastructure Management methodology, called HiSSS. In that first posting we talked about the concept of High Availability. In this segment I'm going to tackle the notion of Stability.

Stability is pretty self-explanatory, simply, you don't want a system that is repeatedly tipping over. Just like we might say that an athlete has a stable stance when they're competing, we want our infrastructure to be strong and stable, so that it doesn't fall over, and leave the customer with a bad taste in their mouth. One of the first ways to do this is to control the rate of change in our systems.

Controlling when and how things happen in our systems is often called 'change management'. We want to manage any changes that are occurring, and mitigate any risks that might impact the stability of our systems. Often times, this change management process relates to software being developed, but it's just as important for infrastructure management. In particular, you want to have very strict guidelines as to when changes will be applied to systems, or when new software will be rolled out. You want good processes in place for handling ordinary operational business, and procedures for how non-standard changes get prioritized and implemented.

Perhaps an example would be good here, and one that also relates to software development. In order to maintain stable systems you want to control the number of times you need to deploy an application, because every time you do a deployment, it's a window for risk to creep in on. But, at the same time, infrastructure needs to embrace the model of Continuous Deployment, because in many shops, it's the future of software development. So how do you manage two, seemingly, opposite demands in this situation? With good change management controls. Instead of fighting against frequent software releases, infrastructure should support a strict change model that allows software to be deployed frequently, but with the least amount of disruption. Clear procedures, and timelines, can make the relationship between development and infrastructure very smooth. If everyone knows that there is a monthly (or weekly) release, and everyone knows exactly how the procedure will work on release day, QA sign-offs, and everything related to a release, then it becomes so ingrained to the workflow that velocity can be increased or decreased with minimal fuss.

But all the controls need to be in place for this to work right which leads to the second major factor in stability, the notion of automation. In order for software deployments (to continue our example) to work right, you need to make the process as automated as possible. Automation allows for repeatability, and repeatability usually means that you've achieved a level of stability. Even when tracking down bugs, being able to repeat a bug, means it's a stable bug, and much easier to locate and fix.

For every process that you want to repeat on a regular basis, it should revolve around pressing one button, running one command, or allowing a timed automated job to run. The amount of human intervention in any process should be bare minimum. As many possible contingencies should be accounted for and mitigated preventatively, and then everything should be set to run with as light a touch as possible. Automation is often one of those things that companies want to achieve, but it often requires going back, after the fact, to add it in. Many times, procedures and workflows develop over time, and it takes a lot of forethought to get everything automated right from the get go. Since that often doesn't happen, it gets easy to ignore it in the future. But that's the wrong choice to make. For a system to be stable, automation needs to play a key role, a crucial role, in infrastructure management.

But how do you know if your system is stable? That's the final aspect I want to present regarding stability, the notion of monitoring. It doesn't do much good to have a large system, with lots of moving parts, if you have no idea how those parts are functioning. Good system monitoring is key to maintaining a stable system. For basic system stability I like to refer to what I call 'short-term monitoring' (my monitoring philosophy will be a whole different blog series hehe). Short-term monitoring is the view into the current system state, and involves quick alerting of issues that need to be addressed immediately. Good short-term monitoring will often let operational staff discover problems before the customer. Pro-active fixes are ALWAYS positive. So having a good view into a system is crucial. If you don't know what's going on, how can you fix it?

The more a system evolves into a completely stable system, the better it can adapt and meet the future needs of the customer. A stable system, along with a highly available one, are two important keys in infrastructure management, and go a long way to achieving that "hum" of a well tuned engine that every infrastructure manager loves to hear.


Popular posts from this blog

I love typing on my iPad

Ok, before you think I've gone crazy and suddenly believe I like smacking away at a non-responsive touch-screen, let me clarify that title... "I love typing on my bluetooth keyboard on my iPad." Like many people, I took the plunge and got a wireless keyboard for my iPad, because for any serious typing work, you really can't beat the smooth responsiveness of the Apple Wireless keyboard. But, just to clarify things further, it's not the bluetooth keyboard that is the reason I love typing on my iPad. Let's correct that title one more time... "I love writing on my bluetooth keyboard on my iPad." There we go, that's better, and it gets to the heart of what I wanted to share in this post. I'm going to make a bold statement, which I'm sure tons of people will find issue with, but here is it. The iPad is a perfect writing tool. Ya, that's right. I just said that a small 10 inch device that you need to purchase an additional keyboard for

I don't have a wood shop...

It's been a few weeks since my last entry on this site, and there's been a good reason. No, it's not because there hasn't been anything interesting to write about, I certainly didn't take advantage of many good opportunities to write about tech news. It has been for a simple, somewhat silly reason. The new Warcraft expansion launched. I hear the groans now, all the way through the ether. People screaming "Oh no... he's one of THEM!" Well, sorry to disappoint, but yes, I am "one of them". But, some clarification is in order. I don't abandon my family to play WoW, I get my work done, I don't call in sick, etc., etc.,. However, it does bring up an interesting new phenomenon in our modern society. Gaming as a hobby. It's long been a staple of adult life to fill our time with hobbies and pastimes that give us something to do beyond work, but yet challenge us a bit mentally or physically. Video gaming has been around only a few deca

Where in the world am I?

This week saw the launch of iOS6, the latest in Apple's mobile operating system iterations. For the most part, it's been a decent incremental upgrade, with lots of new little tweaks, such as Facebook integration, and the ability to update applications without inputing a password. However, the big feature that's been getting all the press is the new mapping app. In Apple's bid to rid themselves of Google "taint", they decided to make their own mapping service, but I think it's become very apparent, that it's not as easy as it looks. Many places are mis-located, or labels are wrong (especially internationally), causing no end to the hilarity of people posting screenshots of mistakes. There's a reason why Google Maps is king, and it's based on why my friend Wes so aptly put forth, that Google is a data company, and Apple is not (yet). Providing good mapping data requires good... well... data. Google has it. Apple, and other competitors don't