An MSP and SMB Guide to Disaster Preparation, Recovery and Remediation
It’s important for a business to be prepared with an exercised business continuity and disaster recovery (BC/DR) plan before it gets hit with ransomware so that it can resume operations as quickly as possible. Key steps and solutions should be followed to prepare and respond to cyber threats or attacks against your organization.
It may be as simple as the deployment of antivirus plus backup and recovery applications for your end users, or a more complex approach with security operations center (SOC) tools or managed response solutions coupled with network security tools such as DNS and Web filtering, network and endpoint firewalls, VPNs, backup and recovery and others.
It’s also essential to ensure end-users are trained on ransomware threats as a part of a good security awareness training program. The bottom line is, if prevention tools and training fail and your organization is compromised, you need to have a protection plan that gets your company assets and resources back to work quickly and securely.
What preparation is needed
When contemplating an in-depth plan, specific questions come to mind—the whats, the hows, the whys, and most importantly, the whos must be defined in the plan. When asking these questions, we need to be prepared to identify the resources, people and applications inlcuded. We must determine how to react to the situation and execute the logical steps and processes required to reduce damage as quickly as possible.
Below are some questions to get us started.
- Who will be involved in recovery and communication when your DR plan is in action?
- How much downtime can your organization withstand?
- What service level agreement (SLA) do we need to provide to the business and users?
- What users do we need to recover first?
- What tools do we have to reduce risk and downtime within the environment?
- How are user networks separated from operational or business networks?
- How quickly can data protection tools get us up and running again?
- Can users get their data back if an endpoint device is compromised?
- Can we determine when the ransomware first hit the network or endpoint devices?
- Are we able to stop the proliferation of ransomware or malware throughout the network?
- Can we recover quickly to a specific point in time?
- Can our users access their data from the cloud before it has been restored?
The solutions below, coupled with an exercised BC/DR plan, will help reduce your organizational risk exposure and allow for quick remediation.
- An endpoint security solution capable of determining what events took place and when
- A DNS security solution capable of turning away security threats at the network level
- A solution for endpoint backup and recovery that can safeguard data should these other solutions be compromised
Lines of Communication
Equally important as the technology are the people who manage and maintain the systems that support the different business units within an organization. For example, your security team and your endpoint support team need to be in regular discussions about how the teams will communicate when under attack. You need to determine who is responsible, what systems, and when they should be brought into the process when under attack.
System Response Ratings
A system response rating system can assist in determining which systems or employees require a higher degree or speed of response. To do this, organizations must specify the value of the system or resource and where that resource sits regarding protection or remediation priority. This is often determined by the value of the resource in monetary terms. For example, suppose the loss of a specific system would incur a massive loss of incoming revenue. In that case, it might be necessary to place a higher priority in terms of protection and remediation for it over, say, a standard file server.
The same can be said for specific individuals. Often C-level resources and mid-tier executives need to be out in front of a situation, which highlights the importance of making sure their resources like laptops and portable devices are protected and uncompromised. They are often as important as critical servers. It is necessary to classify systems, users and customers regarding their criticality to the business and place priorities based on the rating of those resources.
Now that we know a bit of the who, what, and how, let’s look at how to recover from a single system to an entire enterprise.
Recovery and Remediation
Recovery is an integral part of any BC/DR plan. It gives organizations a playbook of what to do and when. But it’s not enough to recover your data. Admins also need to understand the remediation process that should be followed to prevent further infection of systems or proliferation of malware within an organization.
Ransomware hits user’s laptops, encrypting all of the data. The laptops have antivirus protection, but no DNS protection. All network security is in as firewalls and VPNs, with some network segmentation. There is also a security team in addition to the end-user support team. The ransomware that hit is polymorphic, meaning that it changes to prevent detection even if the first iteration of the ransomware is isolated.
The first step is consulting the endpoint security console to learn when and where the malware was first seen. If backups are still running, they should be suspended at this point to prevent infected data from being being backed up with malware. This can be done either from the dashboard or from an automated script to suspend all devices or devices that have been compromised.
A dashboard should provide the ability to do single systems easily, while scripts can help with thousands of devices at a time. APIs can help to automate processes like bulk suspend and bulk restore of devices. At this time it may be prodent to block traffic from the infected areas if network segmentation is enabled to prevent the spread of malware.
Now it’s time to review the protection platform to determine the date the file was noticed, the dwell time and when the encryption/ransomware started executing. Once these facts have been determined, it’s possible track down how the organization was breached. Understanding how malware entered the network is critical to prevent future infections. Since, in our example, ransomware infected devices, a tested and reliable recovery process is also necessary.
Understanding the timeline of events is critical to the recovery process. It is essential to know the timing for the first step in the restore process to set your time to restore. Once an admin can zero in on date and time to restore, affected devices can be compiled into a CSV file and marked with a device ID number to reactivate any backups that were halted once the breach was discovered..
Once the data, source, target device IDs, date, and time to restore from are combined with a bulk restore script, a bulk restore can be pushed to the same laptops or new laptops. As heppen, solutions offering web portals can return to work quickly.
Thre right tools, planning, importance hierarchy and communication channels across a business are essential for establishing cyber resilience. Once a timeline of a breach has been determined, these elements make restoring to a pre-infection state a process that can be planned and perfected with practice.