It’s not an overstatement to say that effective incident management is the foundation of a successful NOC. It enables your MSP to proactively handle critical IT infrastructure issues and restore unplanned interruption to a network service effectively.
However, there are many key steps that MSPs often overlook while creating a framework around incident management, also known as Incident Response Plan (IRP). They make the mistake of treating their IRP as a collection of static documents and don't record or incorporate lessons from past incidents in their plan. This can potentially turn their incident management ineffective and outdated to deal with ever-increasing security breaches and network outrage attempts, limiting their ability to track, analyze, and report trends in incident data.
To address this particular challenge, here are some practices that IT By Design uses in-house that will help improve your existing incident response plan. With these tips, you’ll be able to document solutions for repeat incidents, ensure a quicker resolution time, lower the risk of serious outages, and ultimately create long-term partnerships with your customers.
Even MSPs with the most cutting-edge IRPs should consider incorporating lessons from the contingency operations brought about by previous incidents (if any) that left their networks crippled. From emergency response to server updates, the more you understand the root cause of an incident, the more you can prevent future incidents. To improve your IRP, first:
- Re-identify and re-catalog your network assets. This gives you an accurate accounting of all the data in your system and how critical and sensitive it is.
- Map out every single internal and remote device that connects to your current network—from phones to laptops to printers and more. Don’t overlook guest access in your network.
1. Examine IRP playbooks, document sets
Identify any shortcomings in your existing IR management strategy. If your strategy requires revamping your existing IT policies, go for it. There may be capabilities that your MSP did not require yesterday when you experienced a security breach or unexpected downtime but may become necessary tomorrow, such as work analytics, risk analysis, and IT governance. Consult an expert and make any necessary changes to tackle possible future challenges.
2. Make the plan cross-functional
A well-thought-out and multi-layered incident response plan can be achieved by cross-collaboration between sales, service delivery, and customer experience teams. Take stock of cross-departmental inputs before you revise your IR playbook and then cross-reference against disaster recovery, business continuity, and enterprise crisis management plans. This will allow you to create crisis planning document sets that are as comprehensive as possible.
3. Set KPIs for better accountability
Having KPIs to measure value will establish how well your IR team will perform in the event of an incident. As a thumb rule, include performance metric benchmarks such as Mean Time Between Failures (MTBF), Mean Time to Repair (MTTR), and Mean Time to Failure (MTTF) in the IRP documents.
4. Keep your processes consistent
Make sure your team knows the boundaries of what they can and can’t do. An incident isn’t the time to experiment and disregard the processes and strategies in your IRP document sets. Or else it can easily result in missed incidents, ineffective resolution, reactive support of the environment, and much more. The IRP playbook must provide detailed instructions on:
- Specifying which IT asset will be protected first so that there’s no question as to what you’re accountable for in the event of an incident.
- Setting clear expectations around reporting an incident.
- Following a checklist of must-haves for network restoration.
This step will enable you to create a plan for more streamlined execution and a lower error rate in impacted clients’ workflows.
5. Reimagine your plan frequently
Sit down with your CTO regularly to discuss the latest major network threats and your MSP’s strategy to defend against them—internally and with clients’ networks. This may also mean refreshing your tool stack to:
- Align your technology and your clients with the latest security standards.
- Monitor systems and upgrade networks before problems occur.
- Anticipate and address issues before they arise by identifying weak areas in your clients’ networks.
- Assess the number of reactive tickets you currently receive versus the number you should have.
Want to learn more about 24x7x365 NOC that is proactive and proven effective? Click here.