ITIL Mythbusting: Incident Management Facts That Will Supercharge Your ProcessesNovember 15 , 2016
It’s time to set the record straight on even more prevailing ITIL misconceptions—maybe ones you’ve encountered or believed yourself (gasp!). Once again, we’re joining forces with the Queen of ITIL Mythbusting, Vawns Murphy. Vawns is an ITIL/ITSM expert and the Principle Analyst and Research Director at ITSM.Tools. If you caught our Change Management post, you know that she replaces fiction with fact, plus provides detailed best practices to improve your ITSM processes. This time we’re revealing truths on the top Incident Management myths she’s encountered during her 15+ years in the industry. Read on to test your knowledge on what’s fact and what’s fiction, and you’ll walk away with actionable advice to boost your Incident Management processes.
Myth 1: You need to categorize to the nth degree.
Short answer: No, nope, and no again! The purpose of Incident Categorization is to give your support teams a fighting chance to make those quick fixes, with as little adverse impact as possible. Making the categorization too complicated creates difficulties for end users logging the Incident and can cause incorrect assignment—ultimately slowing down the entire process. The key is to keep the first level of categorization simple and high level—think hardware, software, networks, voice—and make it easy on your poor users.
Time and time again I encounter IT organizations that have become too ambitious with their Incident Categorization. They have a tool that drills down to a ridiculously detailed level and they think “Sure, why not? Let’s do that!” But if you make it too detailed and too complicated, it’s actually a nightmare for both your service desk analysts to log Incidents and for your end users to report them via self-service environments or self-service forms.
The end user perspective
I was on a client site recently where there were four different levels of categorization and none of them made any sense. For example, you could log software application and then the type of application. The types included “core application” and “enterprise application.” I asked what was considered a “core application” versus an “enterprise application”? And the service desk analyst responded that he didn’t know. Now, if the service desk analyst doesn’t know, how’s the end user supposed to know what to select when they’re logging the Incident?
Additionally, their form was so detailed that to log anything, you had to scroll through a kazillion lists. From an end user’s perspective, all they want is to be able to log an Incident quickly and then walk away unscathed without any confusion or drama. They don’t want to go through four or five layers of menus, sub-menus, drop downs, radio buttons, and entry fields.
The service desk perspective
When you’re looking at Incident categorization, the goal here is to get it right the first time. The objective is to fix something as quickly as possible with as little adverse impact as possible. It has everything to do with speed and every part of the Incident Management process counts. You know the trend for marginal gains or margin CSI where you improve every aspect by just 1% and you see cumulative effects? This is similar where you streamline the Incident process to get gains. It’s about making it easy to assign it to the right person. That’s why you keep your top tier simple. The first thing you categorize by should be basic—Is it hardware? Software? A networks thing?—and then you can drill down to the details as it goes through first line support, second line support, and so on. If you make the first level too complicated, you’re going to increase the chance of users getting confused and the Incident being sent to the wrong team. If that happens, the ticket gets bounced and you start introducing delays while it’s being reassigned. Before you know it, you’ve lost 45 minutes. Losing 30-45 minutes just because your categorization is too complicated—well that’s not too great, is it? Especially when speed is such a key factor in successful Incident Management.
Myth 2: No one cares what the Incident Form looks like.
Short answer: In the immortal words of Shania Twain, “you must be joking, right?” Your Incident Form is your way of capturing all information relevant to what’s gone wrong and how to fix it. Your form must be well designed and applied consistently so no matter who deals with the ticket or what’s being logged, the Incident is treated in the same way with nothing lost, ignored, or forgotten about.
Incident Management is one of the first things your end user will see. Everyone in the business will need to log an Incident at some stage. That means Incident Forms must be well-designed and user friendly since everyone sees them. It’s a way of setting out your stool and acting as your shop window. And it’s your way of capturing all the relevant information about the issue and how to fix it.
Your Incident Form also needs to be applied consistently. It doesn’t matter how your end users are logging an Incident—phone, online, self-service, or email—the look and feel should be consistent and the same questions, prompts, and mandatory fields need to be there. That way it doesn’t matter what time of day it’s logged, or who on the service desk is dealing with it—whether it’s Bob who has been there forever or Dave who has been there two days. Exactly the same information will be captured and the Incident will be routed in precisely the same way. This means that nothing can get lost, ignored, or forgotten about in the process.
Keep in mind that a good form will depend on the organization. Some highly regulated organizations, like those in the financial, healthcare, and public sectors, will probably need more details than others. In contrast, tech start-ups tend to be more flexible and need less detail. From an Incident Form point of view, it’s vital that you find the right balance because it’s something that has the potential to turn your Incident Management process from good to great.
The key items for any kind of Incident Form:
1. Title – aka What’s the problem? Ideally you want users to provide a short snippet that succinctly explains the performance issues they are experiencing or what’s down.
2. Description – This field allows users to provide you with more detailed information on what’s going on, including what the person was trying to do at the same time? If there was anything different occurring? etc.
3. What’s the impact? – This is where priority matrixes come into their own because let’s face it, every end user thinks their Incident is the end of the world. If you have a queue of Incidents and they are all priority “DEFCON 1,” which do you do first? A priority matrix also helps avoid the “If in doubt, tick ‘medium’” selections.
4. What service is affected? – I know that sounds really obvious but sometimes at the service desk when we’re under pressure and have a growing number of calls in the queue, we forget to ask the obvious questions. What is it that’s causing pain? What’s down or unavailable? What exactly is experiencing performance issues?
5. How many people are affected? – If it’s just one person who can’t access their email, that’s one thing. But if it’s a department of 500 people, that’s something else entirely. That’s when you scramble to the Batmobile, start making calls, and get people out of bed.
6. Is a workaround available? – It’s important to know if it’s a hard down of if there’s a way to work around the issue. You need to know if there’s anything you can do to mitigate the Incident while you fix it permanently.
One final piece of advice: It’s especially important in a self-service environment to make the form as easy to navigate and to use as possible. One of the golden rules of ITIL is to always make it easy for people to use your service and to follow the process in the right way.
Myth 3: There’s no need to have a separate major Incident process.
Short answer: Repeat after me: Major Incidents are by their very nature, the serious stuff. Major Incidents are issues that cause server business impact which may extend to financial, reputational and even regulator or legal ramifications. In other words, a serious Incident deserves a serious, appropriate process.
I know there’s a school of thought that says: “They are all Incidents and they all need to be treated exactly the same way.” But Major Incidents are different. They are the big, clunky, serious stuff. The stuff where when things start to escalate, senior management gets involved and the business starts escalating. Major Incidents are the ones that cause really severe impact and pain and that might extend beyond your organization and result in financial, reputational, or legal ramifications:
Financial – I used to work for a trading company and there were very clear financial risks associated with Major Incidents. We had one particular application where for every 15 minutes it was down, that particular trading floor lost a million dollars. That’s a scary amount of money!
Reputation – Say you wanted to download a book or a song from Amazon, but it wasn’t working. You’d go elsewhere to iTunes or Barnes & Noble instead, right? Additionally, you’d think to yourself that Amazon clearly can’t be trusted. Amazon is all about being an easy way to shop online 24/7 and a Major Incident could completely ruin that reputation.
Legal/Regulator – There’s been a lot of banking-related downtime at the moment here in the UK. That’s where regulators start to get involved and make sure everything is as it should be. Additional audits, fines, sanctions, and even legal action can be the result of a poorly handled Major Incident; make sure you’re not the example everyone uses as the cautionary tale of what not to do.
Tips for creating a Major Incidents Process
Whenever I put in any major Incident Process I make sure to create a checklist that details step-by-step what needs to be done. For example, at five minutes you recognize it’s a Major Incident, you’ve logged it as such, and you send out a communication to the appropriate escalation teams. In ten minutes, communications have gone out to the business with an initial acknowledgement that you are aware of the Incident and the team is working on it. In 30-60 minutes, another update is sent out. And so on, and so forth until the Incident is resolved.
It’s also important to practice this checklist process because during a Major Incident, people naturally start to panic, things get dropped, and mistakes are made. This process needs to be second nature. You want to practice it enough that it just becomes muscle memory and you can do it automatically without thinking about it. It’s all about being quick and effective so that when the e-mail server goes down and you’re only minutes away from jungle law kicking in, you have a well-rehearsed plan to restore service quickly.
Myth 4: The users haven’t noticed yet so it’s not a real Incident.
Short answer: If only! If that ultra-critical business application is down and no one is screaming at you yet, it doesn’t mean it’s not an Incident. If something in a production environment has fallen over then deal with it, and deal with it immediately. Otherwise, I guarantee you that when people notice and calls start getting logged, there will be screaming. Lots of it. Probably from very senior managers which let’s face it, never ends particularly well.
I saw this recently when IT knew something had fallen over but no one was shouting. The issue wasn’t fixed until a senior manager noticed and things started escalating. Just because there hasn’t been a call through yet to the service desk doesn’t mean it’s not still an Incident. It absolutely is! It doesn’t matter if the business has noticed or not. The idea of Incident management is to fix it, and preferably before the business notices. The sooner, the better because the longer you leave it, the worse it will get.
Plus, keep in mind that just because an Incident hasn’t been logged, doesn’t mean that no one has noticed. There could be loads of reasons why it hasn’t been submitted as an Incident yet. Maybe your end users can’t get through to the service desk. Or perhaps they are busy trying a work around on their own. Or maybe user apathy has kicked in and they figured someone else has already logged it.
In dealing with any customer facing Incident, request, or ticket, I always think “What is my justification for not fixing this as soon as I possibly can?” If I was hauled up in front of the customer, what would I say? If I can’t explain it to myself, how am I going to explain it to the customer? And in most cases, you won’t have a good enough excuse to explain not fixing it right away – so go fix it!
Catch up with our previous blog in this series: ITIL Mythbusting: Change Management Truths You Needed Yesterday
Learn how to foster ITIL best practices throughout the business to gain increased efficiencies in this white paper: ITIL Beyond IT