SharePoint DR strategies

Hi Folks,
Andrzej here. My role in Predica is Managing Partner and Infrastructure Architect. Today I’d like to share with you some insights on SharePoint Server 2010/2013 Disaster Recovery (DR) design.

Truth is, designing SharePoint DR is not a trivial task. SharePoint is a distributed application with a 3-tier architecture (web, application, database), and on each of those tiers we have multiple web applications, service applications and databases that communicate with each other. Additionally there is a number of underlying technologies that can be used in a DR design: Hyper-V Replicas, SQL log shipping, database mirroring, SQL 2012 AlwaysOn and SharePoint-level backup (PowerShell, stsadm). On top of that we can have SQL Remote Blob storage so part of our content is on a file store, custom applications and code that integrates with SharePoint and needs to be included in DR design.

There are a number of 3rd party products that allow to simplify (or at least add new options) to the DR design, but of course none of them is free of charge. Products, like Neverfail for SharePoint, AvePoint DocAve High Availability, Idera SharePoint Backup, Metalogix Replicator, or Microsoft’s Data Protection Manager, might be a good choice for some companies, but I do not recommend choosing any of them before careful technology validation/proof-of-concept and price/performance comparison. Let’s call it “experience”. In this article I want to focus only on what is available out-of-box as Microsoft native DR strategy.

In this article I will present two architectural patterns for SharePoint DR design:

  1. Single farm for high bandwidth, low latency DR sites
  2. Separate farm for low bandwidth, high latency DR sites

Before starting with your DR design you should collect following information:

  • Business requirements:
    • Recovery Time Objective (RTO)
    • Recovery Point Objective (RPO)
    • Note that they may be different for different content or different applications
  • Financial budget: DRC will require
    • Hardware
    • Licenses (different Microsoft editions, potential 3rd party products)
  • IT operations manual labor cost and availability
  • DRC site network parameters:
    • Bandwidth (Mbps)
    • Bandwidth available (%)
    • Latency (ms)
    • QoS capabilities
    • Load-balancing switch/router capabilities

How those scenarios are different in terms of DR approach from Sharepoint.

Scenario 1 – Single Farm

Key points:

  • 1 farm spanning 2 sites: a more hands-off approach
  • Requires synchronous database replication (e.g. with SQL AlwaysOn Availability Group), as

SharePoint does not support asynchronous replication of administration/configuration databases: link

  • You will need a sound network connection to DRC, recommended is min. 1Gbps and <10ms latency
  • Most of SharePoint SQL IO will be read (probably 60-80% for content management scenarios), so mirroring with AlwaysOn is a good solution, as writes, which need to travel across sites, are less frequent then reads, which do not
  • Failure of DRC SQL replica does not impact MAIN SQL instance
  • If you want to minimize the amount of traffic on the wire between sites, consider removing some databases from the availability group, and use a backup/restore or re-create method for them. Good candidates are: web analytics, search, user profile
  • If required this approach can facilitate automatic site switchover (with a witness in a 3rd location): I would not recommend this for various reasons outside the scope of this discussion – automatic DR switchover is only applicable in specific cases
  • Use network load balancing to divert traffic to DRC site, or keep an active/active setup, where DRC front-end/application servers connect to the currently active SQL Replica

Here’s simple diagram to illustrate this scenario (click to enlarge):

Scenario 2 – Separate Farm

Key points:

  • Separate farm in DRC
  • Only content databases are replicated asynchronously across site-link
  • Does not have stringent requirements on the site-link
  • Requires manual work overhead to manage, update the DRC farm
  • Switchover procedure requires more time and steps to be taken: DNS or load balancer changes, bring up the content databases online and attach them to DRC farm..
  • When restoring service applications, ensure you restore through SharePoint API (Powershell/stsadm/central admin) as Microsoft does not support restore of some service applications or configuration/administration databases from SQL backups
  • If you use SQL Remote Blob Storage (RBS) on some of your content DBs, good news is that with SQL 2012 AlwaysOn you can replicate content databases – just remember to configure RBS on the replica as well

As previously, simple diagram below illustrates this scenario (click to enlarge):

What to choose … where to go …???

These three aspects will have the most impact on the DR design decisions:

  • your Main-DRC network capabilities
  • RPO/RTO requirements
  • manual labor cost of managing a second, separate farm.

This was a very general overview, and there are a number of hybrid approaches.

My goal here was only to make you aware of some caveats. If you are still hesitant about how to approach SharePoint DR or any other SharePoint subject for that matter, do not hesitate to contact us!

This was your DR-host for today, Andrzej … stay tuned for upcoming posts.

Cover picture: (ccInnisfree Hotels