Episode 53: Problem Management and Root Cause Analysis

Welcome to The Bare Metal Cyber CISA Prepcast. This series helps you prepare for the exam with focused explanations and practical context.
Problem management is the structured process of identifying and addressing the root causes of recurring or significant IT incidents. Unlike incident response, which focuses on immediate recovery, problem management seeks long-term solutions that prevent future disruptions. Problems may be identified from a pattern of repeated incidents, a single high-impact outage, or a combination of operational anomalies. The goal is to reduce the volume and severity of incidents over time by understanding and eliminating their underlying causes. CISA candidates must be able to distinguish clearly between incident management, which is reactive and fast-paced, and problem management, which is methodical, analytical, and focused on prevention rather than just recovery.
Problem management includes both reactive and proactive strategies. Reactive problem management begins after a major incident or a cluster of similar issues reveals an underlying weakness in the system or process. Proactive problem management, on the other hand, relies on trend analysis, early warning indicators, and ongoing system reviews to identify risks before they result in significant outages. Both approaches are essential for comprehensive IT governance. Auditors should examine whether the organization supports both reactive and proactive workflows, tracks their outcomes, and allocates resources to sustain them. On the CISA exam, candidates may be tested on their ability to spot missed opportunities for proactive problem resolution or to identify situations where reactive measures failed due to insufficient analysis or follow-up.
The lifecycle of a problem follows a clear set of phases. Detection is the initial step, where the issue is identified—often through incident patterns, monitoring alerts, or stakeholder complaints. Once detected, the problem must be logged into a tracking system with a unique identifier and relevant background information. Diagnosis follows, where teams conduct root cause analysis and assess the scope and impact of the issue. The resolution phase involves implementing permanent corrective actions and documenting the process. Finally, closure confirms that the solution was effective, that stakeholders are informed, and that knowledge has been captured for future reference. Auditors review this lifecycle to ensure that each phase is properly documented and that no problems are closed prematurely without full resolution and verification.
Several tools and techniques support effective root cause analysis, which is at the core of problem management. The “Five Whys” method involves repeatedly asking why an issue occurred to trace it back to its source. Fishbone diagrams, also known as Ishikawa models, visually map possible cause categories such as people, processes, technology, or environment. Fault Tree Analysis creates a logical diagram of how multiple failures might have combined to trigger an incident. Another common approach is change correlation, which examines whether recent updates or modifications align with the timing of a new issue. Auditors assess whether RCA documentation demonstrates depth of analysis, whether it includes supporting evidence, and whether the identified root cause logically connects to the observed symptoms. CISA candidates should know how to evaluate RCA quality and how to apply each technique in audit or exam scenarios.
Problem records often trigger changes in related processes, systems, or documentation, so it's important to understand how problem management links to other IT service workflows. A properly diagnosed problem typically results in one or more change requests designed to fix the root cause. Once the fix is implemented, support materials such as user guides, FAQs, and knowledge base entries must be updated to reflect the change. Major problems may also require broader incident trend analysis to confirm that the issue has been fully resolved across the environment. Coordination with configuration management ensures that assets associated with the problem are updated accordingly. CISA exam questions often focus on this integration—how well an organization ties together its incident, problem, and change records for full lifecycle control.
Roles and responsibilities must be clearly assigned to ensure accountability in the problem management process. A designated problem manager is typically responsible for coordinating the lifecycle, tracking resolution progress, and reporting on status. Technical subject matter experts, or SMEs, are engaged during the diagnosis and resolution phases to analyze logs, trace failures, and design solutions. The service desk plays a key role in identifying recurring incidents and flagging them for escalation. Change managers become involved when the solution requires a modification to infrastructure or applications. Auditors look for role clarity, separation of duties, and documentation of who did what and when. For the CISA exam, candidates should understand which roles own each step in the process and how to identify gaps in accountability that may undermine problem resolution.
Formal review mechanisms ensure that high-impact problems are properly assessed, discussed, and resolved. Problem review boards may be convened for major or repeat issues, bringing together representatives from IT, business units, compliance, and audit. These reviews evaluate the effectiveness of the solution, identify residual risks, and determine whether new controls or training are needed. They also provide an opportunity to update documentation, communicate lessons learned, and revise governance processes. For auditors, the existence of structured reviews indicates process maturity. CISA scenarios may present post-incident conditions that require auditors to evaluate whether follow-up was adequate, whether risk was fully mitigated, and whether the organization is learning from its past issues.
Metrics are essential for monitoring problem management effectiveness and identifying areas for improvement. Organizations should track how long it takes to resolve problems, how many incidents were prevented through implemented fixes, and how often similar issues recur. Additional metrics include the number of open problem records, known errors that remain unresolved, and the volume of problems associated with particular systems, vendors, or components. Dashboards and reporting tools help make this data actionable, supporting performance review and risk mitigation. Auditors assess whether metrics are consistently collected, whether they are used in decision-making, and whether they are driving improvements. CISA candidates must understand how to interpret these metrics and link them to overall IT service maturity and audit priorities.
Common problem management pitfalls introduce risk and reduce the effectiveness of the process. These include superficial or rushed root cause analysis, often due to time pressure or lack of expertise. Problems may be closed without truly addressing the cause or without verifying the fix in a live environment. Another issue is the failure to update support materials or retrain staff after changes are made. Perhaps the most significant risk is poor linkage between incident, problem, and change records, which prevents traceability and weakens auditability. Auditors are expected to recognize these red flags in documentation or workflow evidence. CISA exam questions may ask you to identify what went wrong in a scenario and how an incomplete problem management process contributed to ongoing operational issues.
To prepare for the CISA exam and to operate effectively in real-world audit settings, candidates must be able to evaluate problem management processes with a critical eye. You should know how to distinguish between short-term fixes and lasting solutions, and how to audit for documentation that proves both identification and resolution of the root cause. Auditors play a key role in asking whether the same problems keep occurring, whether the organization has fully understood the cause, and whether that understanding has been used to make permanent improvements. Strong problem management is a hallmark of a resilient and mature IT operation. It not only resolves individual incidents—it drives learning, supports risk reduction, and builds trust in the technology environment.
Thanks for joining us for this episode of The Bare Metal Cyber CISA Prepcast. For more episodes, tools, and study support, visit us at Baremetalcyber.com.

Episode 53: Problem Management and Root Cause Analysis
Broadcast by