My apologies, but someone had to tell you.
You don’t have a Disaster Recovery Plan™. You have a Disaster Recovery Hope. (paraphrased from a source I’ve forgotten…)
If I’m wrong (and I hope I am), its because you are in the 10% (optimistically) of companies that actually test their DR plans and document the results.
In the 18 years I’ve been asking the following question, I’ve gotten one excellent and correct response, and one that was good, but I still poked holes in it.
“If your most important database server melted right now, how long until you are back online?”
I get a lot of responses on this question:
- 2 days
- 30 minutes
- I don’t know
- Blank Stares (its a response…just a scary one)
If you are not testing your DR plan, you may be betting your company’s existence on a flawed premise: “Our people know what to do. We have backups.”
If your DR plan is a digital “living document” that lives on your Sharepoint server, you are already in trouble. Everyone involved needs a current, non-digital copy on their desk and at their home.
Disaster recovery is about far more than restoring databases. But, for my mostly-SQL Server audience, let me ask you this:
Can you do an emergency test restore of your most mission critical database onto identical hardware, to a point 27.5 minutes ago and have everything work as expected?
If you don’t do test restores, you are HOPING your backups are valid. If you don’t have the ability to replace critical hardware or spin up an identical VM, you are HOPING that your server or SAN never fails.
If you are running through your DR Plan regularly, with everyone ready to go and knowing their place in the process…you are HOPING the next disaster happens when everyone is available, online and up to speed. What if that disaster happens when your Sr. DBA is on vacation? Who is her backup? A Jr. DBA?
If you don’t know the answer to the “How long…” question above, you really should go find out who in your organization does. If nobody does, contact me.
If you think you have it 100% nailed…I challenge you to let me into your server room unattended.
Thanks for reading!
I like the word “her” 🙂 and following to our last test we need 15 minutes to 4 hours (depending on what exactly melted down and if we can switch over to the log backup server or have to restore the whole server).
Welcome to the 10% that actually test their plan! Congratulations! What if that server room burns? Is the backup server somewhere else, or one rack over? 🙂
we have a second server center in the other building ~ 100 m away and the VMs and backups will be automatical synced to there.
We will suffer several problems, if we are hit by a meteor or big bomb, which destroys both buildings, in this case we would need a restore from credit card and tapes (but don’t ask me for details regarding the tapes, our server team “knows this stuff and I have to trust them” :-).
I know that folks that cannot or will not answer your questions, the ones that “are hopeful” are hopeful for truly economic reason, whether accurately calculated or “just felt in the gut”. These are folks that might say/think “sure Kev, of course you are right, but true DR is so damned expensive; we are barely getting by as it is.”
I do not work for such a company right now. But I have in the past, and the gamble worked out; in fact, if the gamble had not been made, the whole enterprise might have just gone away. I would just feel a lot better about these types of “realities” if the economics if the DR gamble were calculated with rigor. I’d feel better if we knew, for sure, that it was just not cost effective to gamble. Or, that some “partial gamble” was in fact economically wise. Or, that the cost of DR that met our minimal recovery goals was in fact “money well spent”.
My gripe is that too much random gambling occurs; a deliberate calculated gamble I am more willing to accept. Many folks are not even aware that they are gambling with their business.
“Many folks are not even aware that they are gambling with their business.”
— I very much agree!
I believe the original quote being paraphrased here is:
“Until you test a backup by restoring, you don’t have recovery plan, you have a recovery hope.”
~ Geoff N. Hiten
You are correct…found the proper quote late last year I think…