Skip to main content

The HiSSS of Infrastructure - Part 2

In the first installment of this series, I outlined my Infrastructure Management methodology, called HiSSS. In that first posting we talked about the concept of High Availability. In this segment I'm going to tackle the notion of Stability.

Stability is pretty self-explanatory, simply, you don't want a system that is repeatedly tipping over. Just like we might say that an athlete has a stable stance when they're competing, we want our infrastructure to be strong and stable, so that it doesn't fall over, and leave the customer with a bad taste in their mouth. One of the first ways to do this is to control the rate of change in our systems.

Controlling when and how things happen in our systems is often called 'change management'. We want to manage any changes that are occurring, and mitigate any risks that might impact the stability of our systems. Often times, this change management process relates to software being developed, but it's just as important for infrastructure management. In particular, you want to have very strict guidelines as to when changes will be applied to systems, or when new software will be rolled out. You want good processes in place for handling ordinary operational business, and procedures for how non-standard changes get prioritized and implemented.

Perhaps an example would be good here, and one that also relates to software development. In order to maintain stable systems you want to control the number of times you need to deploy an application, because every time you do a deployment, it's a window for risk to creep in on. But, at the same time, infrastructure needs to embrace the model of Continuous Deployment, because in many shops, it's the future of software development. So how do you manage two, seemingly, opposite demands in this situation? With good change management controls. Instead of fighting against frequent software releases, infrastructure should support a strict change model that allows software to be deployed frequently, but with the least amount of disruption. Clear procedures, and timelines, can make the relationship between development and infrastructure very smooth. If everyone knows that there is a monthly (or weekly) release, and everyone knows exactly how the procedure will work on release day, QA sign-offs, and everything related to a release, then it becomes so ingrained to the workflow that velocity can be increased or decreased with minimal fuss.

But all the controls need to be in place for this to work right which leads to the second major factor in stability, the notion of automation. In order for software deployments (to continue our example) to work right, you need to make the process as automated as possible. Automation allows for repeatability, and repeatability usually means that you've achieved a level of stability. Even when tracking down bugs, being able to repeat a bug, means it's a stable bug, and much easier to locate and fix.

For every process that you want to repeat on a regular basis, it should revolve around pressing one button, running one command, or allowing a timed automated job to run. The amount of human intervention in any process should be bare minimum. As many possible contingencies should be accounted for and mitigated preventatively, and then everything should be set to run with as light a touch as possible. Automation is often one of those things that companies want to achieve, but it often requires going back, after the fact, to add it in. Many times, procedures and workflows develop over time, and it takes a lot of forethought to get everything automated right from the get go. Since that often doesn't happen, it gets easy to ignore it in the future. But that's the wrong choice to make. For a system to be stable, automation needs to play a key role, a crucial role, in infrastructure management.

But how do you know if your system is stable? That's the final aspect I want to present regarding stability, the notion of monitoring. It doesn't do much good to have a large system, with lots of moving parts, if you have no idea how those parts are functioning. Good system monitoring is key to maintaining a stable system. For basic system stability I like to refer to what I call 'short-term monitoring' (my monitoring philosophy will be a whole different blog series hehe). Short-term monitoring is the view into the current system state, and involves quick alerting of issues that need to be addressed immediately. Good short-term monitoring will often let operational staff discover problems before the customer. Pro-active fixes are ALWAYS positive. So having a good view into a system is crucial. If you don't know what's going on, how can you fix it?

The more a system evolves into a completely stable system, the better it can adapt and meet the future needs of the customer. A stable system, along with a highly available one, are two important keys in infrastructure management, and go a long way to achieving that "hum" of a well tuned engine that every infrastructure manager loves to hear.

Comments

Popular posts from this blog

The beat goes on

Yesterday Apple revealed their long awaited entry into the streaming music field. They were able to do this quickly because of the acquisition of Beats last year, and the systems and intellectual property that came with that purchase. Considering that the music reveal was pretty much the only big news out of a pretty benign developer keynote, I'll take a few moments to talk about what I think about it. Apple was perhaps the defining company in the music revolution of the past 20 years. With the introduction of the iPod that revolutionized portable music, to the creation of the iTunes store and the eventual death of DRM, Apple has been at the forefront of digital music. This leadership comes with high expectations to continue to lead, and so many people have long questioned Apple not getting into the streaming music business quicker. For the past few years new companies have come forth to lead the change in the streaming music evolution. From Pandora and its ability to create un

The NEW Microsoft

Today Microsoft held their Build conference keynote. As with Apple and Google, developer conference keynotes have become a mainstay of announcements for the general public beyond developers. At first it seemed that Microsoft would be bucking that trend today as the first portions of their keynote were very, very developer centric. However, a lot changed when they started talking about Windows 10. Microsoft is betting the future on building a platform that applications will build off of. Much like Apple and Google, they seem to be discovering that the real money isn't in the operating system itself, but in helping bring applications to consumers through validated app stores. In Microsoft's case it's also seeking to converge all of their platforms into a single unified platform. They once again reiterated today that Windows 10 will run on all of the devices that are out there, from phones to tablets to PC's to XBox game consoles. This means that applications can be writ

CES 2015 quick notes

One of the fun technology events every year is the Consumer Electronic Show. I've never had the opportunity to attend this in person, but maybe now that I have family in Vegas I should try and make it out some year. CES is a huge event that highlights some of the cool and crazy stuff that all the big consumer electronics companies are working on, and attempting to bring to market. Since I've been laid up sick for the past day and a half, I've been catching up on the news feeds of all the stuff that's currently coming out. Although CES isn't strictly laptop and computer focused, computer companies still play a major role. This year, I'm seeing a lot of emphasis on thin and light computing devices. ASUS and Lenovo  have both released some exceptionally light weight laptops, and hybrid tablets, that give the MacBook Air line a run for it's money. Additionally, HP is building off the success of it's Stream line of Chromebook competitors with an HP Stream