Whether it was caused by a fire, flood or other natural disaster, or resulted from a malicious criminal act, an IT system outage can cripple your business. Technology is essential to almost all operational processes today, so experiencing downtime means you can’t answer customer inquiries, develop new products, run production lines, ship your product, or keep your employees productive. System outages are costly and stress-inducing at best. In the worst-case scenario, they can cause firms to close their doors permanently.
According to research firm Gartner, downtime costs companies an average of $5,600 per minute, with hourly costs ranging from $140,000 to $540,000, depending on the business’s size and vertical. Gartner also found that 43 percent of small to midsize businesses (SMBs) shut down immediately in the wake of a “major loss” of data, with as many as 51 percent ceasing operations within two years of such an event.
IT outages can and do affect businesses of all sizes in all industries. Cloud computing giant Google spent over $13 billion on data center infrastructure in 2019, but still saw multiple network-wide system outages. Smaller businesses with leaner technology budgets are naturally more vulnerable to hardware failures, but even the newest and most advanced computers, servers, and storage devices can still break down unexpectedly.
In fact, the majority of unplanned downtime—59 percent of it—is caused by human error, and this rate has remained consistent over the past few years. Even as the inherent reliability of hardware continues to increase, IT systems are growing more complex, and thus present their users with additional opportunities to make mistakes. People are imperfect by nature, and all businesses are at significant risk of system outages caused by employee errors and accidents.
What’s changing, however, is the amount of data loss and downtime that’s now being caused by cyberattacks. Whereas hardware failures and human errors have held steady positions as top causes of data loss since 2014, malicious acts now account for a share of the problem that’s 11 percentage points greater than it was five years ago. Ransomware alone is estimated to have cost global businesses more than $11.5 billion in 2019, and it’s said that an attack takes place every 14 seconds. Forecasters predict that by the close of 2020, ransomware will be attacking businesses once every 11 seconds.
The sobering reality is that today’s companies are more likely to experience major IT system outages than ever before, even as their day-to-day operations become increasingly dependent upon technology. The reason is simple: as the amounts of downtime caused by human error, natural disaster, and hardware failure remain relatively stable, cybercriminals continue to become more sophisticated and resourceful, and their attacks are better targeted and more likely to cause harm.
It’s incumbent upon business leaders who value their customers’ trust and who wish to protect their company’s reputation—and safeguard its future—to develop a plan for managing these risks.
Simply put, there’s nothing that an ill-prepared organization can do to avoid significant costs, process interruptions, and employee anxiety or panic in case of a major system outage.
The benefits of advance planning for how your company will handle downtime and disasters are twofold. Well-prepared firms will experience far less downtime, lower costs, and fewer business disruptions. At the same time, they’re less likely to experience significant system outages in the first place.
Business continuity and disaster recovery (BCDR) planning is crucial for boosting your organization’s resilience. Having a well thought-out, intentional, and best practice-based BCDR plan in place can increase employees’ confidence, protect your business’s reputation, and improve your ability to manage risks. For instance, in the case of ransomware attacks, businesses with comprehensive BCDR plans in place are 92 percent less likely to experience significant downtime than those that don’t have them.
The “business continuity” component of BCDR planning involves establishing step-by-step procedures that your employees can follow in order to return your business to regular operations as soon as possible in case of a natural disaster, IT system outage, or other catastrophic event. These steps may include temporary manual replacements for technology-dependent workflows, but it’s also important to outline the human resources and third-party services you’ll need to call in for help, as well as the specific functions they’ll perform.
In contrast, disaster recovery planning consists of implementing technologies and best practices to ensure business-critical IT systems get up and running again as quickly as possible after a crisis, and that data loss is minimized or prevented.
The most important element in any business’s disaster recovery strategy is conducting regular backups of critical systems. No matter your business’s size, the fact that you have reliable backups that are stored offsite and in isolation from your central IT environment will dramatically reduce your risk of incurring significant costs or damages in case of a system outage or cyberattack.
Backups alone don’t constitute a complete disaster recovery strategy, however. Besides maintaining copies of your business-critical data in a secure secondary location, you should also be testing your recovery procedures regularly. It takes time to restore data and applications from backup, so it’s essential to ensure that the recovery process can be completed quickly enough to protect the continuity of your business’s operations. When planning and testing your recovery procedures, you’ll want to keep track of two important metrics.
Like all mechanical systems, backup hardware devices don’t last forever. Repeated use will eventually cause tape and spinning disk (HDD) drives to malfunction, and even solid state or all-flash storage arrays can be written to a finite number of times. Thus an important element of testing backups is ascertaining the ongoing health of the systems, and making sure that they’ll work when needed.
With the advent of cloud computing, reliable backup and recovery infrastructures have become more accessible and affordable for businesses of all sizes. It used to be that only the largest of enterprises could bear the cost of building full-scale redundant systems that could automatically take over computing capabilities in case of primary system failure. But the cloud’s resource-sharing models have made backup and recovery as-a-service options cost-effective for even the smallest of businesses, and easy for those with small IT staffs to take advantage of. But these systems must be put in place before disaster strikes in order to solve the problem.
In most cases, how long it takes a business to recover from a full-scale IT system outage—and thus avoid devastating financial losses, repetitional damage, and other severe consequences— is a function of how well prepared that business was to face the event. There’s a simple linear relationship between preparedness and downtime: the better your disaster recovery plan, the fewer disruptions you’ll experience.
Businesses that don’t have a well-developed plan in place will incur higher costs and more disruptions, and should engage an IT service provider with extensive experience helping companies in this situation mitigate their losses. Generally speaking, the less prepared a business is, the more specialized expertise it will require to get their systems up and running again.
In some situations, such as a hardware failure, fire, flood, or natural disaster, this will be obvious. Hardware damage will be readily apparent, and its extent will be clearly defined.
When businesses are struck by ransomware or are the victims of other malware-based cyberattacks, however, it’s essential to get expert assistance in this area. Cybercriminals deliberately employ cloak-and-dagger tactics whenever possible. Ransomware is frequently engineered to re-infect machines after their hard drives have been re-imaged and data restored from backups, or designed to infiltrate and corrupt backup systems that aren’t properly isolated from primary networks. Identifying which individual endpoint device was the original source of infection is key to thwarting these sorts of persistence strategies. Accurately identifying the exact strain of malware that’s involved in a cyberattack is also critical to ensuring that all infectious elements have truly been wiped from your systems.
During this phase, your team will also need to gather documentation on the software, hardware, and configuration settings that are in place in your IT environment. This information will be essential in forensic investigations, and will also be useful when you’re assessing whether systems or resources should be immediately replaced.
An essential part of disaster recovery is defining the roles and responsibilities that members of your team—or any external consultants you’re working with—will need to assume in times of crisis. The first few hours after an incident begins to unfold are times of uncertainty and stress for all your business’s employees. Giving everyone clear directions and a well-defined set of steps to follow can reduce stress and panic.
The duties involved in responding to a major security incident are wide-ranging. Employees in multiple departments (including marketing and PR as well as IT) will have critical roles to play, and the company’s leadership should stay involved and informed at all times. Depending on the nature and scope of the incident, you may need to involve external cybersecurity consultants and law enforcement officials, and may need to communicate with employees, customers and the general public. At a minimum, your technical team should include a cybersecurity expert, a networking expert, a senior system administrator, and at least two people who can manage desktop remediation.
An experienced IT service provider can lead you in managing the full process.
Within the first 24 hours of any major IT systems outage, you’ll need to assess the availability and reliability of the backups you have in place. If you’ve got recent and granular snapshots of all affected data on hand and the problem is limited in scope or was immediately contained, you could be back in business the next day.
Conversely, if you don’t have backups at all, or if your backup systems were also compromised in the incident, full recovery may take months.
This phase should include a thorough cost-benefit analysis of all available recovery options. In some cases, alternatives that might seem more expensive initially (such as replacing all affected hardware) might actually save you money over the longer term by reducing the number of hours of emergency IT services you’ll need to consume, and enabling you to upgrade to better-performing, more secure, and more resilient systems that will reduce your risks in the future.
Be sure to evaluate the extent of your insurance coverage, and what your policy will or won’t pay for. Some cybersecurity insurance companies will cover—and may even encourage—paying ransoms to criminals in exchange for their promise to restore your data. Bear in mind, however, that although data restoration rates in cases where victims have elected to pay the criminals are improving, you never have any guarantee that the encryption key you’re provided will actually work. The decision to pay ransom or not should take the reputation of the criminals into consideration.
Which plan is right for your business depends on the cause of your IT systems outage, your budget, and your tolerance for downtime. There’s no one-size-fits-all plan that’ll work for all companies in all industries.
There are, however, industry-wide best practices and protocols that should guide your recovery efforts. In the case of a cyberattack, for instance, responding to a major incident requires the right tools and technologies, the expertise to know how to use them, and an understanding of the proper protocols to follow. The more readily available these critical elements are when disaster strikes, the less time your recovery will take.
For many companies, there may be a silver lining to IT disasters. Often cyberattacks or hardware failures give those who suffer them the opportunity to modernize technology infrastructure, move to cloud-based services, upgrade software, and improve resilience overall. A crisis’s short-term costs may actually translate into long-term business benefits that ultimately increase efficiency and productivity.
The old saying “what doesn’t kill you makes you stronger” is clearly applicable here. Though a significant percentage of SMBs do go out of business in the weeks and months following a severe IT system outage, those that do not often emerge from the incident with more resilient systems and better processes in place.
The key to making lemonade from the lemons of disaster recovery is reliable documentation and honest discussion. Be sure to keep records of all activities that take place during the incident response and recovery process so that you can assess what worked well and what could have gone better. It’s especially important to conduct a post-mortem session where your team can assess the lessons you’ve learned from the incident, and analyze how best to incorporate them into your business continuity and recovery plan for the future.
The most important step your business can take to safeguard its future from the potentially devastating consequences of unplanned downtime is to develop a comprehensive business continuity and disaster recovery plan. Such preparedness can make the difference between success and business failure, especially in today’s landscape of increasingly sophisticated cybersecurity threats.
But BCDR planning isn’t always straightforward or easy. A managed IT service provider with specific experience in your industry can help you lay the groundwork for real resilience and preparedness. The MSP can guide you in selecting backup and recovery solutions that fit your business needs and budget, can help you choose cybersecurity technologies that will protect the hardware and equipment you rely on, and can assist you in building business processes for increased resilience.
Here at CNS Partners, we have more than 20 years of experience working with midsize organizations in the manufacturing sector. We have a deep understanding of the specific IT security challenges you face—including the recent escalation of the threat posed by ransomware—and we know how important it is to control quality and optimize production in your facilities.
To learn more about the ways we’ve helped other companies just like yours build resilience into their IT environments cost-effectively, download CNS Partner's Expert Guide to High-Performing IT Systems today.