Using Root Cause Analysis to End Your Next Infrastructure Fire Drill

firedrillImagine that you’re in a house and the room next to you is burning. You would never simply close the door to the room and go about your business, because you know eventually that the fire will consume the house. The obvious reaction is to put out the fire. Yet, when companies see a crucial part of their infrastructure on fire, many times they simply close the door and go about their business.

It’s an overly simplified analogy but in today’s enterprise networks, their sheer complexity not only makes it hard to put out a fire, it often makes it hard to tell where the fire is. Teams do their best by making educated guesses with a preference for operational uptime and performance – goals to ensure that the least amount of users are affected by the fire. The solutions crafted by some teams often do minimize the problem but don’t fix them.

Back to the fire analogy, teams might close the door to the room next door but need to access the room on the other side of it. When a part of an organization’s infrastructure goes down, IT teams are asked to ensure that the least amount of end-users are affected. So they create pathways to circumvent the room, or build additional rooms to get around the fire. In the real world, that could mean adding more capacity with new servers, expensive emergency services and engineering untested solutions that lead to more complexity to an infrastructure that still has a burning room within it.

The better answer is in root cause analysis (RCA), a methodology that looks at current and historical infrastructure data to set a benchmark for how everything should run in a stable environment and using analysis to tell IT teams where the problem is within their infrastructure.

With a powerful root cause analysis solution such as EDM, IT teams spend less time searching for the problem, less effort on guessing how to minimize the issue and less money on trying to circumvent the problem. We believe that teams should spend more time fixing an infrastructure problem the first time and less time searching for it. Root cause analysis enables teams to keep the infrastructure they have, open the door and simply put out the fire.

Interested in finding more about automating root cause analysis? Read our ebook that details the five steps proactive IT operations teams take to leverage anomaly detection and event correlation to reduce mean time to resolution.