Friday, May 25, 2018

Incident Management vs Problem Management




Incident Management vs Problem Management: Is there a Difference?

ITIL has been around since the late 1980s. We are currently on version three (v3). There are a lot of books and courses about ITIL but there’s still real confusion about where Incident management stops and Problem management begins, and the difference between the two. If it was just a terminology issue I wouldn’t be so worried about it, but the reality is – confusion about incident and problem management hurts us all.

Now confusion between two terms and definitions wouldn’t normally be such a big deal, but not being familiar with the differences between these two processes can end up having a huge negative impact on not only your infrastructure, but your business as a whole.
Here are the meanings of each word, according to the definitions used by ITIL, and how these meanings translate into the timeliness of the fix needed:

What is an Incident?
An incident is an event that leads to an unplanned disruption of service. The important part to remember is ‘disruption of service,’ because if an issue does not disrupt service, even if it was unplanned and unexpected, it is not an incident. For example, if a piece of hardware fails after hours when nobody is using the system, it is not an incident, because it did not disrupt service. However, if the same equipment failed during the regular workday, it would be defined as an incident because service was, in fact, disrupted. The IT help desk is often the first ones to be made aware of an incident, as they are usually the first point of contact for users experiencing issues with the system.

What is Incident Management?

Picture Courtesy: http://www.seriosoft.com/

The main goal of incident management is to resolve the disruption as soon as possible in order to restore service operations. The objective of the Incident Management Life cycle is to restore the service as quickly as possible to meet Service Level Agreements. The process is primarily aimed at the user level.
Due to the fact that even minor disruptions in service can have a huge impact on the organization, it is necessary to fix incidents immediately. The process of incident management usually includes recording the details of the incident and resolving it.

What is a Problem?
Also according to ITIL, “a problem is a cause of one or more incidents”. This problem is initially unknown and results from a number of incidents that are related and have common issues. While problems are not classified as incidents, incidents can raise problems, especially if they may or do happen repeatedly. To refer to our above example, the situation of the server that is only used during the day crashing after office hours is a problem because although it isn’t currently causing a disruption in service, it could happen again and become an incident.

What is Problem Management?
Picture Courtesy: http://www.seriosoft.com/

The goal of problem management is to identify the root cause of the incidents and try to prevent them from happening again. It might take multiple incidents before problem management can have enough data to analyse what is going wrong, but if undertaken correctly, it will help the problem become a “known error” and steps can be put in place to correct it.
Sometimes problem management is referred to as a reactive process that begins only after incidents have occurred. In actuality, problem management should be thought of as a proactive process because its end goal is to identify the problem, fix it, and prevent it from ever happening again. So, you could say the main goal of problem management is to identify the problem, troubleshoot it, document the issue as well as the causes of it, and then ultimately resolve it. Problem Management deals with solving the underlying cause of one or more incidents. The emphasis Problem Management to resolve the root cause of errors and to find permanent solutions. This process deals at the enterprise level.

Now, let’s look at an analogy comparing Incident management and Problem management

Incident management is like a fire-fighter at a house fire: it comes in, immediately fixes the problem, and saves the day. Fire-fighters come to the scene and notice the issue, and work fast to put out the fire as quickly as possible without stopping to question how it started. This is a similar situation for incident management. While it is necessary for incident management to provide fast results and repair issues within the infrastructure, it doesn’t help us find out what ultimately went wrong and why there was an issue in the first place. That’s where problem management comes in.

Problem management is like the detective that comes into the picture after the fact. They weren’t there to put out the flames themselves, but they can still investigate what went wrong, figure out how the fire started, and help educate people to take preventive steps so something similar doesn’t happen again. Problem management is a vital piece of the puzzle as it addresses the root cause of the incidents and proactively prevents them from repeating and potentially causing major issues in the future. Without taking time to review incidents and problem solve, they will just continue to happen and potentially increase in seriousness.

Conclusion
Understanding the difference between Incident management and Problem management, and having dedicated managers for each separate scenario, ensures that you are not just putting out fires all day. While immediately fixing problems in the infrastructure with incident management provides temporary relief, it will soon exhaust your resources and employees without finding the root of the problem. Bringing in problem management helps to investigate the cause of the incidents and puts steps in place so it doesn’t continue to occur. By having a specific manager or team for this process, you will be one step closer to decreasing the rates of incidents in your organization and preventing major outages and service disruptions

6 comments:

  1. Nice guide for anyone aspiring to make a career in the aforementioned field. Appreciate your time and insight Luqman. Thank you very much

    ReplyDelete
  2. Very nice mate and nicely explained thanks for the insight.

    ReplyDelete
  3. Good one Luqman helpful information
    Do keep adding more. Thank you

    ReplyDelete
  4. Wonderful Article. Thanks for sharing this post

    Site Reliability Engineering Training
    SRE Training in Hyderabad
    Site Reliability Engineering Training in Hyderabad
    Site Reliability Engineering Online Training
    Site Reliability Engineering Training Institute in Hyderabad
    SRE Training Course in Hyderabad
    SRE Online Training in Hyderabad

    ReplyDelete