Predicting Drive Failure Using Artificial Intelligence

In today’s digital world, the completeness of your data protection is critical. Users are more aware that traditional backup cannot address the threats and challenges of the modern world. Instead, what is needed is modern cyber protection – a more complete, multilayered approach that protects all environments, systems, and data.

While cyber protection is most frequently discussed in relation to managing and securing operating software, it can also involve managing the hardware that stores your valuable data – namely your disk drives. Because if your drive fails, any data stored on it can be lost forever.

Alexander Ivanyuk, Acronis Cyber Officer
Author: Alexander Ivanyuk, senior director, product and technology, Acronis

All hard drives will fail at some point. While this is particularly true of traditional, magnetic disk hard drives (a.k.a. hard disk drives or HDD), newer solid-state drives (SSD) that use flash memory also face issues once their memory cells start to fail. Several factors can lead to errors in these data storage devices.

For complete cyber protection, ensuring the stability and reliability of your drive is critical to safeguarding your data.

Disk Failure is Often Unexpected

Disk drive failure typically makes some of the stored data inaccessible, and there’s often no cheap or easy way to restore that data. For individuals, the issue is usually solved by using a backup product such as Acronis True Image, which allows users to restore data from backup.

When we’re talking about running a business, however, the issue of disk failure is much worse. A failed hard drive means downtime – and all of the lost revenue and increased expenses that entails. In the best-case scenario, the data is backed up and the only loss is the time required to reimage the system on a new drive – but even that causes unscheduled delays and a reprioritization of IT resources. For a managed service provider, such a disk failure can become a real headache, as a technician may have to visit the customer.

In addition to data access, another point is related to data safety. If a hard drive fails before the next scheduled backup is done, all of the work from that period can be lost. Depending on the backup configuration, that loss can range from a few hours to several days.

Cyber Protection Includes Predicting Disk Drive Failures with Acronis AI

As an innovative technology company, Acronis has been investing in various artificial intelligence (AI) and machine learning (ML) based technologies for several years. Many of our developments are focused on data safety predictions and, as a result, using disk drives as storage media were of interest to our engineers. The ultimate goal was to decrease the downtime caused by storage media failure by providing users with an easy, automated way to both predict the failure and quickly recover from it.

Acronis achieved this goal by developing Acronis Disk Health Service, which predicts disk drive failure and alerts the administrator, suggesting various actions to decrease expected downtime.

Acronis Disk Health Service is a cloud-based service that is hosted in the company’s data centers around the world. Using machine learning, it processes a large amount of disk-drive-related parameters to build a model that can predict when a drive will fail with a high degree of accuracy.

The main source of data actually comes from the S.M.A.R.T. reports of the disk drive itself, with many parameters collected and analyzed at scheduled time intervals. There are also several events in the Windows operating system related to HDD and SSD performance that are analyzed as well, including bad blocks reported, delays or no access to disk, delayed write to disk, file systems corruption, etc.

Acronis software agents also collect the write/read performance on endpoint machines, so both the transfer rate and disk queue length is taken into consideration when creating the ML model. Disks are actively monitored for latency distribution, giving the AI more data to process. This monitoring does not slow down the user machine in any way, as Acronis agents are optimized to act during disk operations that happen during software running cycles.

As a result, the admins at a service provider will be able to see something like this:

The color of each block corresponds to the current average state of the disks installed in the computers of a specific customer. Each customer’s box can be expanded to show the health status for each disk:

Customers can see their cumulative disk health statistics. More importantly, based on a drive’s health status, they can run various backup operations and other tasks right from the management interface – minimizing the effect of a disk failure and reducing their downtime.

Making Data Even More Secure

Using well-optimized and properly trained ML models can add a lot to the security of data stored on disk drives, whether they are magnetic disk or solid state. In internal tests, Acronis showed that its ML-based disk health prediction model is correct more than 98% of the time. Deploying this service will ultimately enable businesses that use Acronis products that support this technology to:

  • Decrease downtime
  • Minimize data loss
  • Plan administrators/technical specialists time

Similarly, service providers that use the Acronis Cyber Cloud platform will be able to decrease the costs of supporting their customers.

Acronis currently operates 18 data centers around the globe, with 5,000 PB of customer data stored and 20,000 new drives under management added every month.

Alexander Ivanyuk is senior director, product and technology at Acronis. Read more Acronis guest blogs here.