Incident Management
Problem Management
Change Management
Request Fulfillment
Service Level Management
Knowledge Management
Service Asset and Configuration Management
Self-Service
IT Financial Management
Remote Support/Control
Background System Management
IT Process Automation
Incident Management Automation
Software Deployment
Cloud Service
Pricing
Free Trial
Deploy and Monitor
Alerts and Notifications
IT Health Status
Real-Time Dashboards
AIOps
Reports
Hypervision
Mobile App
Integrations
Looking to learn about all things ITSM, ESM, Self-Service, Knowledge Management, AI, and more? We've got you covered.
We’re committed to providing resources that help you address all of your ITSM software needs.
Stay up to date on our latest ITSM, ITOM or ESM webinars and events now
EasyVista | March 21, 2024
In IT service management (ITSM), downtime can translate into significant losses for the company, quickly resolving the root cause of incidents is critical for your business’ success. ITIL (Information Technology Infrastructure Library) Root Cause Analysis (RCA) is a systematic approach designed to uncover the underlying issues behind IT service disruptions. The frameworks, methodologies, principles, and techniques center on the premise that it’s more effective to solve for and systemically prevent issues (i.e., stop them from occurring again) rather than just putting each fire out.
This blog post dives into the intricacies of ITIL RCA, its methodologies, and its significance in maintaining robust IT infrastructures.
At its core, ITIL RCA is a structured method used to determine the fundamental reasons behind incidents and problems within an IT environment. Unlike superficial fixes that merely address symptoms, RCA aims to prevent incident recurrence—enhancing the overall system reliability.
The core of RCA centers on:
There are multiple well-known methodologies used to conduct RCA. Below are 3 of the most popular methods and frameworks—used across various industries. Try all of them and see which one best fits your needs and preferences.
Fault Tree Analysis (FTA) is a top-down approach that visually represents potential causes of a specific incident and examines the undesired state of a system. The system was originally developed by H. Watson and A. Mearns in Bell laboratories for the Air Force in 1962. It was later adopted by Boeing and is now used by companies in the aerospace, chemical, and software industries for reliability events. By systematically breaking down events into contributing factors, FTA helps identify the root cause (the undesired outcome is taken as the root of the logic tree) and its dependencies. The fault tree is typically written out using logic gate symbols. The basic symbols used in FTA are events, gates, and transfer symbols.
FTA Event Symbols
FTA Gate Symbols
FTA Transfer Symbols
The transfer symbols, “Transfer in” and “Transfer out” are used to connect the inputs and outputs of fault trees.
The 5 whys Root Cause Analysis method is based in the idea of asking "why" multiple times to trace problems back to their origins. The technique encourages IT teams to delve beyond superficial explanations and uncover deeper underlying issues. It also helps you to avoid assumptions and focus on what has occurred.
How to use it:
1/ Ask a question about “why something happens within your software” or “why your product does x instead of y?”
2/ For every answer to your WHY question, ask another, deeper “Ok, but WHY?” question.
TIP: A good way to think about this is to imagine you’re talking to a curious child, who’s being slightly annoying and keeps asking you, “Why?” after you explain something to them. If you’re getting annoyed at the amount of whys you’re asking, you’re on the right track. The more you ask “why” and uncover all the intricate parts of your IT infrastructure, the better you’ll be at finding issues and resolving them to better your security/ product.
Example
Question |
Answer |
Why is the application running slow for users? |
The server hosting the application has high CPU usage. |
Ok. Why is the CPU utilization so high? |
There is a sudden surge in concurrent user logins. |
And why is there a surge in user logins? |
A new marketing campaign launched without IT input. |
Why didn't IT know about the campaign? |
There's a lack of communication between teams. |
Ok, and why is communication lacking? |
No formal process exists for project impact analysis. |
As you can see, this makes for a useful informal method to push teams to dig a little deeper than the initial symptoms to figure out what is going on. At the beginning, it will make sense for technicians to try and deal with high CPU usage, but without understanding why that is happening in the first place, we would never conclude to resolve the actual problem, which in this case is lack of a submission process to analyze the impact of projects.
The Ishikawa diagram, also known as a cause-and-effect diagram, categorizes potential causes of a problem into major groups, such as people, process, technology, and environment. This visual tool eases collaborative analysis and holistic problem-solving.
How to use it:
1/ Start with the problem in the middle of the diagram (the spine of the fish skeleton)
2/ Brainstorm several categories of causes (placed in off-shooting branches from the main line, the ribs of the fish)
3/ Group the categories and break them into smaller parts (e.g., “People” might be a potential root cause factor of “training”)
4/ Dig deeper into potential causes and sub-causes – question each branch to get closer to the root issue at hand
5/ Eliminate unrelated categories and identify correlated factors (i.e., root causes)
Common Categories to Include:
With effective RCA practices in place for your IT service management, you’ll be able to diagnose and address any IT-related problems proactively—potentially saving your organization hundreds of thousands, or even millions of dollars. The three steps below outline the overview of what best practices are recommended to successfully implement RCA in your organization.
ITIL Root Cause Analysis is a cornerstone of effective IT service management, enabling organizations to diagnose and address underlying issues proactively. By adopting structured RCA methodologies and fostering a culture of continuous improvement, businesses can enhance operational resilience, reduce costs, and deliver superior services to their customers. Embracing RCA is not merely about resolving incidents; it's about cultivating a mindset of problem-solving and innovation that drives long-term success in the ever-evolving landscape of IT operations.
Our 2024.1 product release includes root cause analysis, digital accessibility, automated IT asset discovery, and enhanced AI capabilities updates. EV Discovery’s Discovery & Dependency Mapping (DDM) roadmap will help customers gain a 360-degree view of their IT landscape; automate asset and configuration management; track changes and maintain audit trails; and seamlessly integrate with EasyVista’s ITSM products— additional dependency mapping features are expected to roll out later in 2024.
EasyVista is a global software provider of intelligent solutions for enterprise service management, remote support, and self-healing technologies. Leveraging the power of ITSM, Self-Help, AI, background systems management, and IT process automation, EasyVista makes it easy for companies to embrace a customer-focused, proactive, and predictive approach to their service and support delivery. Today, EasyVista helps over 3,000+ enterprises around the world to accelerate digital transformation, empowering leaders to improve employee productivity, reduce operating costs, and increase employee and customer satisfaction across financial services, healthcare, education, manufacturing, and other industries.