Last Updated on January 15, 2024
How well do you really know your plan? Chances are you really don’t know if your recovery plan is going to work until you test it. Because no matter how good you are at determining requirements and developing plans, no one gets everything 100% right coming out of the gate. Interdependencies, data flows… if you do map everything correctly the first time, take that extended vacation—you’ve earned it.
For example, Marketing might say, “We’re strategic, so if we’re down for a week it’s not going to hurt anybody.” But then you talk to Sales and they say: “We gotta be up within two days, and to do what we do we must have this data feed from Marketing, without which we’re dead.”
Interdependencies between systems often become so matter-of-fact, accepted, and often misunderstood or just plain invisible that very few people really know where all the information comes from that populates the systems they rely on to do their jobs. So unless you test your recovery plan, you aren’t going to know what you don’t know.
Unfortunately, most recovery plan testing is limited to a tabletop—not an operational fail-over. In a tabletop test, you pull out your recovery plan, review it, and talk through a scenario.
At that level, everything may look fine. But it’s only through a failover that you’ll shake out the bugs and ensure that your alternate facility or alternate processing capability accounts for all the actual interdependencies.
Upgrades are another area that could cause recovery-related issues. When Marketing switched from System F to System G, was the recovery plan updated with all the infrastructure changes? Unless your documentation is current and ports over to your recovery plan, you may be missing connections that were in place previously. That can have a significant impact on recovery, slowing down the entire process as delays trickle down to other systems that may need technical attention.
Another question is: are the backup systems (still) configured the same way as the primary systems? Unless you’re operationally testing them, you don’t really know. Some organizations have found out the hard way that their backup systems weren’t compatible with their production systems, because the primary systems were upgraded to, say, Oracle 12c but the backup systems were still running on Oracle 10g.
The fact is that recovery/business continuity testing must be done operationally so that if something goes wrong, it goes wrong in a safe environment where your business and its reputation is not at stake and you don’t have to put your neck on the chopping block. Then afterwards you can go about your business with a much higher expectation of recovery success and a much more accurate understanding of what your true recovery capability is.
Here’s another thing an operational test will validate: Say you’ve got 100 systems in your data center. The 20 most critical of them need to be recovered within 12 hours, while the least critical 20 can be delayed for 10 days or more, and the rest fall somewhere in between.
Say you estimate that your techs can recover each critical system within four hours. But you’ve got 20 systems that have to be recovered within 12 hours. Do the math: if you can recover one system every four hours that means in 12 hours you can bring up 5 systems… if you work around the clock. Unless you conduct the operational test, it’s just that – an estimate. Responding to an actual disaster is not the time to find out your estimates were off.
So what do you do then? Either adjust your recovery requirements, choose a different strategy that allows you to bring systems up faster, or hire more IT staff.
Until you conduct operational failover testing, you’re never really sure you can recover your systems within recovery time objectives (RTOs) and recover your data within recovery point objectives (RPOs). This is how you learn things like: we can bring up System 1 in three hours, System 2 takes only two hours, and System 3 takes eight hours.
Tabletop testing is great, and it has its place, but it can’t give you the real, complete picture of whether you can fulfill your recovery commitments with the assets and strategies you have in place.
In Part 2 of this post I’ll discuss how to plan and conduct operational testing of recovery plans.
To get expert support on crafting operational recovery tests that will not only test your true recovery capability, but also are based in operational reality, contact Pivot Point Security.