By Ross Saunders (MSc student and Facilitator at Da Vinci)
26 February 2015
Introduction to disaster recovery
Within any organisation, business continuity is paramount to the organisation fulfilling its responsibilities to both stakeholders and clients alike. In ordinary day-to-day operations, the continuity of activities is rarely thought of consciously, however it is key that this exercise takes place. In the event, of a disaster, be it a single server going down or a building fire, a rapid response and implementation of a disaster recovery plan can be the deciding factor of continuing or closing your doors. In this article, seven key considerations are discussed in order to help ensure you have covered some often overlooked bases with regards to Disaster Recovery.
Executive Support and Understanding
Part of the reason (at least in my experience) that disaster recovery takes a back-seat, is that the executives of the organisation either do not understand the gravity of needing to have a DR plan or that they assume that the IT department has “taken care of it”. This is often not the case unless the department is specifically mandated to look after this critical function.
Conveying the severity of a disaster is key to educating the executive as to the importance of a plan. No organisation wants to demonstrate the severity of a disaster in practice, however this is often the only time planning (or the lack thereof) reaches executive levels.
With this in mind, it is vital that members of the IT department meet with the business executives to convey a mutual understanding of the severity and necessity of planning for a disaster. This too needs to be managed in a calm and level-headed manner, as often the response from business executives is that the only acceptable situation is that there is never a disaster. Sadly, we do not live in a perfect world, and failures will happen, be they physical or otherwise.
The IT department should consider the use of stories as a means to discuss the acceptable turnaround time for various disaster scenarios. Each potential situation should be hashed out in a considered manner that can be understood by all parties; technical jargon is a no-no. For example, a major payment gateway system for the organisation may only be down for minutes before the business starts suffering losses, whereas the internal system containing last year’s employee picnic photos may be down for days before Christine in PR notices. Speaking of multiple systems…
Disasters Don’t Only Happen to IT Systems
Another myth of disaster recovery is that it should be self-contained within the IT department. Yes, the responsibility of DR may well fall on IT, however the problem belongs to every department in the organisation.
Each department in the organisation should at the very least complete a self-assessment of their own requirements for disaster recovery. In this age of Software-as-a-Service, it is a frequent occurrence that departments will subscribe to systems without the knowledge of the IT department, complicating the issue of Disaster Recovery. Similarly, a critical function of a department may not be understood by the IT department, and subsequently ignored as a non-critical function.
In both cases, the situation would be avoided by communications between departments. Both parties should understand the systemic details of the departmental functions of the business, from start to finish, in order to ensure coverage.
Cost vs Downtime
Once departmental functions have been mapped, it’s back to the executive for the golden conversation around costs versus downtime. In many cases, the shorter the downtime, the higher the costs in implementing disaster recovery.
This is where creativity and lateral thinking can greatly assist in keeping costs down while keeping availability up. Gone are the days when the only option for failover was to have an entire carbon copy of your infrastructure. Benefits of the cloud and hybrid models can be easily adopted (albeit at the cost of bandwidth and performance) where certain aspects of infrastructure can be outsourced to scalable functions such as Amazon Web Services or Microsoft Azure.
A novel hybrid model that I was privileged to have assisted on, involved completely rethinking the requirements of some business functions; eventually scrapping many of the internal services entirely in favour of migrating them to a cloud service.
For a small software company hosting its applications offsite and having uncapped, high-speed data, it made little sense to retain backup servers, project management servers, file servers or code repositories on-site, and as such all of these functions were migrated to the Amazon Web Services cloud. All that remained on site was a directory server for password management internally, as well as in the cloud by means of Federation Services.
While novel in footprint and cost, this approach did have its downsides should the internet line fail, however a fail-over internet connection carried a much lower cost than a fail-over environment would have. Sometimes you need to break the existing mould of thinking to find a better, more modern solution!
Backups Are Not Disaster Recovery
Speaking of more modern solutions, having backups within the organisation does not in any way mean that disaster recovery is in place. Yes, backups are vital to have, but not having anywhere to restore them to renders them little more than digital paperweights.
I see it often that organisations accept the fact that they have backups, and they believe that this is sufficient. These are the same organisations that realise the errors of their ways too late. Procurement of hardware is more often than not a drawn out, costly process. Should a server go down in a blaze of glory and silicon, replacement of the hardware may take weeks, particularly if you are in a remote location or if there is an out of date component involved.
Always make sure that your planning takes the wider ecosystem into account. Your backups will amount to naught if you have no-where to restore them to.
On-site versus Off-site
Again a discussion of cost, one needs to decide what will be held offsite and what will be held onsite. I can guarantee that a building burns to the ground faster than you would expect! Consider that significant disasters do happen, and as such, what can you not live without.
Having off-site disaster recovery may be costly for a physical environment, but again, hybrid models may well come to your rescue. The cost of hosting systems in the cloud is often flexible, and depends largely on the amount of resources you wish to allocate to a system. As such, moving critical and non-critical functions to copies in the cloud with reduced system specifications, while not ideal, is cost effective and at the very least may get you up and running in a short period of time while you set up shop in a temporary location.
Process and Plans
All of the above is worth little without a plan to implement it. Similarly, the plan is worth little without responsible people driving it. This aspect of Disaster Recovery is often neglected due to similar reasons above, such as “the IT department will handle it”.
The IT department may also end up wasting a significant amount of time if its constituents do not know ‘who’ is supposed to do ‘what’. Proper planning with responsible parties is a critical part of your Disaster Recovery. The responsible staff members should be versed in the process they are assigned to, and they should have contact details available should a disaster happen.
Lastly, the process should be revisited on at least a quarterly basis, both to ensure that the information contained within the plan is still valid, but also to ensure that the responsible parties are still able to perform their duties. It does not help if a disaster occurs and Jabu – who is responsible for swapping out the hardware in the Johannesburg data centre – was transferred to Cape Town last month.
Of course, the best way to test your plans is to…
Having buy-in from the executive, it should not be difficult to get time allocated to perform a dry-run of your disaster recovery plans. This practice will allow you to see the loopholes and faults in your planning, and allow for a correction of the plan before a real disaster occurs.
Situational dry runs are also great to determine whether your failover systems are adequate. Run your business on them, see how they perform, you will rapidly gain results as to whether you should be concerned or comforted.
In the end, disaster recovery is up to everyone in the organisation. It is beneficial to discuss it at all levels, and that others in the organisation have an input. You’ll be amazed at what departments outside of IT have to offer in an advisory capacity when they are actively engaged.
Without fail, an isolated disaster recovery plan, will result in an isolated and slow disaster recovery.
Contact Ross Saunders for more information: