In a previous blog post, I discussed how Denver (DEN) and Atlanta (ATL) airports’ transportation systems provided a powerful commentary on airport design and disaster recovery. While airport architecture is clearly different from IT architecture, the blog highlighted some critical overlapping concepts.
DEN and ATL chose different strategies when developing their terminal transport systems. DEN implemented a single path to terminals C and D and the result is extremely long and inefficient recovery when that path fails. ATL, in contrast, incorporates a system with multiple transportation methods including one that is independent of technology. In the world of IT, DEN’s strategy is similar to having a single protection methodology. Having one approach is certainly better than having none, but it does not give flexibility of choice in regards to backup and more importantly recovery. The result is that it can be difficult to meet corporate SLAs with a single protection strategy and DEN clearly illustrates this in their inability to recover from an unexpected train failure.
The core protection problem in IT is that not all applications have the same SLAs. For example, personal user shares while certainly important for user productivity and satisfaction are unlikely to significantly impact a business if they experience extended downtime. In contrast, other applications like financial systems or order entry applications, to name a couple, are typically critical and downtime could result in significant declines in worker productivity, customer experience and even revenue loss. To address these differing needs, data protection practitioners should look at a range of technologies to ensure that they can meet the RTO (recovery time objective or how long it takes to get data back in the case of a loss) or RPO (recovery point objective or how much data you are at risk of losing at any given moment) for all their applications.
If we look at RTO and RPO in the case of DEN, it is clear that the airport did not think through these concepts. If they had embraced these ideas then they might have created multiple recovery options that could be implemented based on timeliness of flight departures and passenger schedules. For example, they could have had a small number of buses parked at the airport ready to go at a moment’s notice that would be used to transport passengers with flights leaving within an hour. This would reduce the delays and give them time to mobilize the larger scale transportation recovery mechanism. From an IT standpoint, the equivalent strategy would be to incorporate different protection methodologies. For example, an IT practitioner could use traditional backups as one recovery method and this approach is similar to DEN’s offsite buses because it can take time activate and implement the recovery. Other technologies like continuous data protection or snapshots could also be implemented which would deliver enhanced RTOs and RPOs and thus reduce expected downtime and data loss; this strategy is similar is to having onsite buses at DEN airport. By combining two different protection models, both IT and DEN can better optimize how they recover from unexpected failures.
The key lesson we can learn from DEN is that it is critical to think through RTO and RPO when architecting a protection solution. Failing to do so results in a situation similar to DEN where a failure of one component can have a large impact on corporate productivity, customer experience and even revenue creation. Fortunately for DEN, it is a major airport and so there are few alternate options. We are not so lucky in the world of IT and an extended outage can have far reaching repercussions.