Disaster Recovery is Still About Recovery Time

How does virtualization and Cloud impact disaster recovery?

Well, in many ways, it doesn’t. Across town or across clouds, disaster recovery is still about recovery time.

If you have responsibility for a business, you’re concerned with being operational to meet your customers’ needs. This responsibility includes contingencies for unplanned outages.

Depending on your business, the required recovery time will vary regardless of whether you run your data center in a physical, virtual, or cloud environment.

Recovery times will determine the right solution and since many technologies now support physical and virtual environments you are likely to find an appropriate solution for your specific needs.

My best conversations with customers about replication and disaster recovery (DR) strategies have been around a white board looking at the data center environment and working out the basics of what data needs to go where in what time frame.

Case in point was a visit from the CIO of a major financial clearing house. Given my choice to him of death by PowerPoint or him white boarding his problem, he wisely choose to sketch out his data center challenge.

As we worked through his storage and network topology, it became clear that while he came in wanting a zero recovery time solution, all he needed was a less expensive, disk-based backup system.

Why? He had a 24-hour recovery-time requirement and nothing more.

Here’s a high-level look at how our conversation basically went. The different technologies we looked at were covered in the last 15 minutes of our one hour meeting.

Business Value in the Data

Disasters can happen to anyone—and they do. So, whether your legacy physical IT shop a new virtual data center with your mission-critical applications, or that nifty cloud strategy you put in place with your service provider, you need a replication and DR strategy that’s right for you.

In this situation, the CIO had physical systems and was in the process of moving to virtualization. However, the few basic self-assessment steps I suggested to him were the same as if he’d had a full-blown cloud deployment:

First, ask yourself… What data is most important? How much data can you afford to lose? Not the easiest question, particularly for this user.

Second…How quickly do you need to restore your critical processes? You might be willing to sacrifice some application performance to ensure data integrity.

What if everything comes to a complete stop?

In this case, the time needed to be back online was dictated by industry regulations.

Third…What are your obligations whether to your customers or a regulatory body? The CIO had compliance requirements particular to the financial industry that dictated a 24-hour minimum recovery time.

Finally, consider what might cause the disaster situation. DR is sometimes construed as pertaining to natural disasters. This CIO had an off-shore business in a direct path of seasonal hurricanes.

But, it’s the seemingly innocuous things like a prolonged power outage or equipment failure that most often cause disasters.

Service Levels Dictate the Right Solution

The key to determining the best replication and DR solution is a thorough understanding of service levels; that is, application availability and recovery.

Before getting into the choices, the CIO and I focused on two key measurements as should you:

Recovery-point objective (RPO): the point in time to which critical data must be restored following an interruption before its loss severely impacts the organization.

Recovery-time objective (RTO): the amount of time that it takes to recover the data and restart business services before the absence of the data or applications severely impacts the organization.

Making the Right Choice

Not surprisingly, different replication technologies yield different RPO and RTO results. And, though zero RPO and RPOs may seem ideal, each approach has its place depending on the business need.

Let’s step through them.

Tape vaulting: Offsite tape storage takes the most time since physical tape needs to be retrieved from a storage facility and tape fed back into the infrastructure—which can mean days (24-48 hours). Tape vaulting may be appropriate for small businesses or for larger organizations where data may be retained for analytical purposes or archiving to meet compliance requirements.

Iron Mountain comes to mind for off-premise tape vaulting services from a third party.

Tape or disk backup: Full backups are another common practice, but with exponential data growth, it has become a challenge as organizations grapple with how to recovery quickly. Recovery from tape can take days; fortunately, recovery from disk can take less time—often around 8 to 10 hours or better.

EMC Data Domain provides disk-based back and recovery, with de-duplication that speeds the replication process in ever shrinking backup windows by copying only changed data.

Asynchronous replication: This technology uses mirrored copies taken over certain time intervals to provide recovery times from minutes to hours but still may not be appropriate for those situations with the most stringent recovery requirements.

A good example of asynchronous replication is the widely-adopted EMC Symmetrix Remote Data Facility (SRDF) and specifically, SRDF/A for long-distance replication.

Synchronous replication: This replication technology leverages simultaneous mirrored copies to provide zero RPO, which may be preferred for business-critical application and data protection in larger enterprises, but is only as good as the last copy.

EMC Symmetrix Remote Data Facility (SRDF) again fits the bill, but it’s SRDF/S for zero data loss over regional distances (i.e. less than 100 miles).

Continuous replication: With the ability to “roll-back” data to previous points in time, continuous replication leverages a journal to deliver the efficiencies of asynchronous replication (but without the need for full mirrored copies) with the near zero RPO more akin to synchronous replication.

Look no further than EMC RecoverPoint for a good example of continuous local data protection as well as continuous remote replication with DVR-like recovery for DR and other use cases.

Depending on your DR need, one or more of these replication technologies may be appropriate for you; the RPO and RTO you target may affect your choice.

For the CIO cited here, with a 24-hour recovery requirement, a disk-backup system was the right solution.

The bottom line is regardless of whether it’s virtual, physical, or cloud, disaster recovery is still about recovery time.

About the Author: Mark Prahl