Trainer and Consultant extraordinaire Ginger Grant stops by to talk Machine Learning, Data Bricks, Certifications, Norwegian pastries and proper chocolate frosting
Image credit to Jeff (t)
Back in June of 2019, I published this YouTube video covering the highlights of the various SQL Server High Availability and Disaster Recovery options. But I didn’t do any of it in writing like I usually do in the companion piece…so here we go!
First, some definitions:
- High Availability (HA)—typically means that the database will be back online in seconds or minutes, not hours or days
- Disaster Recovery (DR)—the ability to bring the database server and databases online after something really bad happens
- HADR—An umbrella term covering any feature that encompasses HA, DR or both.
- RPO—Recovery Point Objective—Essentially governs what point we can recover to (or…how much data loss can we handle)
- RTO—Recovery Time Objective—How long to get everything running again.
SQL Server comes with a number of features that cover different scenarios, in different ways. Some are only in the Enterprise edition.
I am going to cover these at a very high level so you can quickly get the idea in your head, and refer back to this post as needed later. We will start with the most basic.
Backup and Restore
It sounds simple, but backing up the databases is the simplest, most cost effective choice in a DR solution. BACKUP DATABASE is available in all editions of SQL Server. Express edition makes it hard to automate, because there is no SQL Agent functionality. There are answers for that, such as Windows Task Scheduler and the Ola Hallengren Scripts.
Backups by themselves are great, but in the event of an actual disaster, the ability to RESTORE them is the critical part. You should be backing up and test restoring your databases regularly.
This is database level only. Backup all user and system databases.
Backup and Restore is DR, not HA.
Log Shipping (LS)
Transaction Log Shipping at its core is just a Backup-Copy-Restore process with some bells and whistles added to it. GUI for setup, alert jobs to let you know when it gets behind, etc. But the basic concept is that every 15 minutes a T-log backup is taken, then copied to another server, then restored there. In LS, you can set up the Secondary server to be read-only between restores. You can have multiple Secondaries.
This is database level only, so Logins, jobs, etc. are not copied.
You need to be familiar with File shares, UNC paths, permissions settings, etc. to set LS up.
Log Shipping is DR only, not HA unless you write a bunch of scripts to detect an issue, catch it all up and repoint your applications to the new Primary.
Database Mirroring (DBM):
Database mirroring is a deprecated feature, but it still exists in some recent versions of SQL Server. Check the Microsoft documentation for your version. DBM is available in Standard and Enterprise editions. As the name implies, the Principal copy of the database is mirrored to the Mirror copy on a secondary server.
DBM works at the transaction level, unlike Log Shipping which uses T-log backups. DBM can be set to be either Synchronous or Asynchronous modes, and either Automatic or Manual failover. Not all combinations of these two exist.
This is Database level, so jobs, logins, etc. do not participate.
Since Automatic failover is available, I classify DBM as HA, and DR at the Database level.
Always On—a marketing term from MS that includes 2 features:
SQL Server Failover Cluster Instance (FCI)
This is a traditional Windows Cluster (WSFC) that has been around for ages, with one or more SQL Server Instances installed into it.
Failover is handled by the Windows Cluster service and is usually very quick. A few seconds, but exceptions exist, such as 10000 databases all starting up at the same time.
An over-simplified explanation—2 or more servers (nodes) are connected to a SAN. The databases exist on the SAN, so in a failover situation, they don’t move.
If the SQL Instance on Server A becomes unreachable, the cluster stops the SQL Service there and starts it on Server B.
The SAN is a single point of failure, unless it is being replicated via some non-SQL Server technology. For this reason, a SQL FCI (in my opinion) is not a full HADR solution…but definitely IS HA.
SQL Server Availability Groups (AGs):
AGs are an Enterprise only feature as of this writing.
AGs are essentially a much-improved version of Database Mirroring. Transaction level data movement from Primary to Secondary for one or more databases in a Group. Multiple groups are allowed. Synchronous or Asynchronous. Manual or Automatic failover. Readable secondaries to offload reporting queries.
Standard Edition has a “Basic Availability Group” which has lots of limitations, chief among these being one database per group.
When set up correctly, AGs are both HA and DR for the user databases, with no single point of failure. There are script options to keep the jobs, logins, etc. in sync between the Primary and all Secondary replicas.
A Windows Cluster is required, but a SAN is not. AGs work off local storage, not shared since there is a copy of each database on each server participating.
The licensing of AGs has changed a lot, so I won’t get into it here…but you probably already know that Enterprise Edition licenses are VERY expensive. Plan accordingly with your license vendor.
Replication—a very Special Snowflake:
SQL SERVER REPLICATION IS NEITHER HA NOR DR.
Not everything in a SQL Server user database CAN be replicated, such as users, or tables with no Primary Key. New objects are not automatically sent from Publisher to Subscriber. System databases are not replicated.
Replication IS a great option to send a subset of your data to another server for reporting, or for filtering by region or salesperson.
Replication is for Distributed Data Processing. Backup and Restore beats Replication when you have to rebuild an environment, every time.
I hope this helps you have a better understanding of the high level concepts of the various HA/DR options available to you. There are “gotchas” and details that are impossible to cover in this post. But, all of these are extensively well documented by Microsoft and the SQL Server Community at large.
Thanks for reading!
Image by Gordon Johnson from Pixabay
We live in amazing times.
I live in a small town. Horses and hay fields all over the place.
I run Dallas DBAs from my home office, providing income for my family, 2 employees and several awesome people that I sub-contract work out to when needed. One of them I’ve never met face-to-face.
The technology of today makes the dream work.
I have no servers here…just 3 laptops and 6 screens. All of my software comes from SaaS providers, such as Office 365 and Quickbooks Online.
All of this came together to allow me to help someone I will never meet help HIS customer solve an issue on their production system.
I took a 3 class series of online, live courses from Brent Ozar (b|t). Once a month in December, January and February. This was facilitated by GoToMeeting on Brent’s end, Slack for the attendees to communicate questions to Brent and chat with each other, and a high powered Virtual Machine I built in Microsoft Azure, just for running really heavy workloads.
During that class I got to know one or two of the attendees pretty well, from their questions, to their funny comments, etc. There were a few that I shared Direct Messages with in Slack. I looked some of them up on LinkedIn, because I might want to catch up with them in the future. Many of the same people were in all 3 of the same classes with me. By February, it was like seeing old friends that I’ve never actually seen.
Just this week, one of those attendees reached out to me asking if I had a few minutes to spare, which at that time I did (right in the middle of a VERY busy week). We got on a Zoom call, screen-shared into his system, where he had a VPN going to connect to his customer’s Azure SQL DB.
We did some troubleshooting, used some of the things we learned in class, eliminated some possible reasons and came away with some things to test to make performance faster.
Without the technology of today, there is a zero percent chance I would ever have (virtually) met Bob, connected on LinkedIn, and been able to help. Much less actually see the problem in action.
Read back through and look at all the pieces of tech and people that made this fairly simple interaction even possible. Laptops, stable internet, live online training, LinkedIn, Slack, Zoom, GoToMeeting, Azure.
We live in amazing times.
Thanks for reading!
Andy Levy was so excited about the Data Bits podcast I just had to invite him to be an early guest!
He was already on the list so I bumped up to vNext 🙂
Completely forgot to Mention:
Introducing Data Bits – the podcast (ignore mentions of ‘Data Points’ – someone else already used that!)
First live guest episode – feel free to laugh at how hilariously bad we were at this 🙂 SQL Server, Surfing, Billiards, Beer…
Hopefully I will be able wrangle more people from all parts of the data world into being guests – SQL Server, Oracle, DBA, Dev, BI, industry legends and people just getting started!
Denny Cherry (b|t) / Alan Hirt (b|t) / David Klee (b|t) / PASS.org / DBA Fundamentals VG / SQL Bits / Steve Jones (b|t) / Matt Cushing (b|t) / Erik Darling (b|t) / NoColumnName / Pinal Dave (b|t) / Adam Machanic (b|t) / sp_whoisactive / DBATools
T-SQL Tuesday is a monthly blog party for the SQL Server community. It is the brainchild of Adam Machanic (b|t) and this month’s edition is hosted by Jess Pomfret (b|t), who has asked us to write about “Life hacks to make your day easier“.
So many hacks accumulated over 30+ years of adulting…
Coffee timer. I will not own a coffee maker that doesn’t have an “Easy to change for the weekend” timer built-in. The last thing I need to try to do is set up coffee in the morning. I wake up, get cofee, get to work. Starts the whole day off on the right foot!
Technical, the first:
Since I am an independent consultant here under the Dallas DBAs name, I am frequently asked to come in and quickly find out why the server is so slow. Sometimes by existing customers, sometimes out of nowhere. Many of these times I cannot make any “permanent” or “lasting” changes, so my go-to is to run sp_whoisactive as a temporary stored procedure (code in link). Works the same, nothing left behind, no corporate policies violated.
Technical, the SQL:
Registered server queries:
In SSMS, we are all used to the Object Explorer window showing the SQL Servers we have connected to. A surprising number of people do not know about the Registered Servers window (Ctrl-Alt-G) that does not appear by default. In this you can register and group servers, and then run queries against all of them or any subset depending on how you group them.
My preference is use Prod and Non-Prod Groups for my clients with more than a few instances.
You can even export these settings so other people can easily have the same list without having to add them one by one. The most servers I have ever managed at once was 130. Every morning I ran SELECT @@SERVERNAME against the whole group just to see who wasn’t responding on any given day.
This is profile/machine specific, so its great for those that work in a corporate environment using SSMS from their laptop. Works well when set up on a jump box via RDP too. Not so much when the client has you remoting into a prod server directly and you have to keep changing your RDP target.
Thanks for reading!