Skip to main content

The HiSSS of Infrastructure - Part 2

In the first installment of this series, I outlined my Infrastructure Management methodology, called HiSSS. In that first posting we talked about the concept of High Availability. In this segment I'm going to tackle the notion of Stability.

Stability is pretty self-explanatory, simply, you don't want a system that is repeatedly tipping over. Just like we might say that an athlete has a stable stance when they're competing, we want our infrastructure to be strong and stable, so that it doesn't fall over, and leave the customer with a bad taste in their mouth. One of the first ways to do this is to control the rate of change in our systems.

Controlling when and how things happen in our systems is often called 'change management'. We want to manage any changes that are occurring, and mitigate any risks that might impact the stability of our systems. Often times, this change management process relates to software being developed, but it's just as important for infrastructure management. In particular, you want to have very strict guidelines as to when changes will be applied to systems, or when new software will be rolled out. You want good processes in place for handling ordinary operational business, and procedures for how non-standard changes get prioritized and implemented.

Perhaps an example would be good here, and one that also relates to software development. In order to maintain stable systems you want to control the number of times you need to deploy an application, because every time you do a deployment, it's a window for risk to creep in on. But, at the same time, infrastructure needs to embrace the model of Continuous Deployment, because in many shops, it's the future of software development. So how do you manage two, seemingly, opposite demands in this situation? With good change management controls. Instead of fighting against frequent software releases, infrastructure should support a strict change model that allows software to be deployed frequently, but with the least amount of disruption. Clear procedures, and timelines, can make the relationship between development and infrastructure very smooth. If everyone knows that there is a monthly (or weekly) release, and everyone knows exactly how the procedure will work on release day, QA sign-offs, and everything related to a release, then it becomes so ingrained to the workflow that velocity can be increased or decreased with minimal fuss.

But all the controls need to be in place for this to work right which leads to the second major factor in stability, the notion of automation. In order for software deployments (to continue our example) to work right, you need to make the process as automated as possible. Automation allows for repeatability, and repeatability usually means that you've achieved a level of stability. Even when tracking down bugs, being able to repeat a bug, means it's a stable bug, and much easier to locate and fix.

For every process that you want to repeat on a regular basis, it should revolve around pressing one button, running one command, or allowing a timed automated job to run. The amount of human intervention in any process should be bare minimum. As many possible contingencies should be accounted for and mitigated preventatively, and then everything should be set to run with as light a touch as possible. Automation is often one of those things that companies want to achieve, but it often requires going back, after the fact, to add it in. Many times, procedures and workflows develop over time, and it takes a lot of forethought to get everything automated right from the get go. Since that often doesn't happen, it gets easy to ignore it in the future. But that's the wrong choice to make. For a system to be stable, automation needs to play a key role, a crucial role, in infrastructure management.

But how do you know if your system is stable? That's the final aspect I want to present regarding stability, the notion of monitoring. It doesn't do much good to have a large system, with lots of moving parts, if you have no idea how those parts are functioning. Good system monitoring is key to maintaining a stable system. For basic system stability I like to refer to what I call 'short-term monitoring' (my monitoring philosophy will be a whole different blog series hehe). Short-term monitoring is the view into the current system state, and involves quick alerting of issues that need to be addressed immediately. Good short-term monitoring will often let operational staff discover problems before the customer. Pro-active fixes are ALWAYS positive. So having a good view into a system is crucial. If you don't know what's going on, how can you fix it?

The more a system evolves into a completely stable system, the better it can adapt and meet the future needs of the customer. A stable system, along with a highly available one, are two important keys in infrastructure management, and go a long way to achieving that "hum" of a well tuned engine that every infrastructure manager loves to hear.

Popular posts from this blog

Push it... push it real good...

The other day I got a chance to play with the new Apple force touch trackpad. This is a new design that Apple has put on their laptops for non-mechanized clicking on trackpad. When you press on the trackpad it senses the force that you're pressing with, and when you reach a certain level, you feel a 'click'. If you keep pressing, you feel a second 'click'. The unique thing is that these 'clicks' aren't physical in nature. The trackpad never moves at all, but the click that you feel is from haptic feedback. In essence, when you press with enough force, the trackpad clicks back at you. You feel the sensation of clicking, but it's simply the trackpad responding to your pressure.

I got to play with this for a while, since the Apple Store rep was talking with us about soccer, and after a short bit I was getting the hang of it. I feel that it would take quite a bit longer though to really feel comfortable with this new paradigm. I'm someone who has a …

Hack! Slash! Burn! Crush!!

The big tech news story of the weekend was the hacked account of Mat Honan. As documented in his posting on Wired.com, in the space of a few hours his digital life was in shambles. And as much as we always talk about strong passwords, etc., this was not a case of password failure. It was a case that shows just how our desire for on-demand, cloud based services that are convenient can come back to haunt us.

I highly suggest you go read all 4 pages of the article, but the quick summary is that a hacker wanted control of Mr. Honan's Twitter account. In order to get it, they started with basic social scouting, and proceeded to use all of the built-in tools of Google, Amazon and Apple to gain access to his accounts without ever needing to crack a single password. At Google they discovered what his Apple ID e-mail address was when they did a simple "Forgot my password" query. Then at Amazon, they called up customer service and game'd the system to get access to the last 4 …

The beat goes on

Yesterday Apple revealed their long awaited entry into the streaming music field. They were able to do this quickly because of the acquisition of Beats last year, and the systems and intellectual property that came with that purchase. Considering that the music reveal was pretty much the only big news out of a pretty benign developer keynote, I'll take a few moments to talk about what I think about it.

Apple was perhaps the defining company in the music revolution of the past 20 years. With the introduction of the iPod that revolutionized portable music, to the creation of the iTunes store and the eventual death of DRM, Apple has been at the forefront of digital music. This leadership comes with high expectations to continue to lead, and so many people have long questioned Apple not getting into the streaming music business quicker.

For the past few years new companies have come forth to lead the change in the streaming music evolution. From Pandora and its ability to create uniqu…