HADR

SQL Server Reliability: Clearing the Fuzzy Thinking

November 19, 2025 by Kevin3NF Leave a Comment

The terminology around reliability is a mess

If you’ve ever said, “We’re covered, it’s replicated,” you’re in good company.

SQL Server is a massive, 35+ year-old product that has changed, renamed, and re-imagined its features so many times that even seasoned IT pros get tripped up. Between log shipping, mirroring, replication, snapshots, Availability Groups, clustering, and VM backups, it’s no wonder managers end up with fuzzy expectations about what keeps their data safe.This isn’t a failure of leadership; it’s the side effect of a product that’s both deep and deceptively familiar. You’ve got “copies,” “replicas,” and “backups” everywhere, but they don’t all mean the same thing.Let’s clear the fog.

Replication Isn’t Disaster Recovery

Replication was born for scale, not survival. It’s great for reporting, distributing data to remote sites, and keeping subsets of tables up to date. But if the publisher goes down? There’s no built-in recovery process. You’ll spend more time re-initializing and troubleshooting than restoring from a backup.

In addition, the subscribers are read-write. This means someone downstream can delete a row, causing updates at the publisher to not get moved, which can cause a huge issue.

Replication moves data, not databases. It’s not a safety net; it’s a delivery truck.

Log Shipping Is DR, But Not HA

Log shipping is a solid, dependable form of disaster recovery. It keeps a warm copy ready for manual failover, which is perfect for a secondary data center or a standby VM in the cloud.

But it’s not high availability. There’s no automatic failover, no continuous read/write capability, and your secondary is always behind by at least one transaction log.

Think of log shipping as a lifeboat, not a bridge. It’ll get you safely across the storm, but only after you climb in.

Snapshots Aren’t Backups

A SQL Server database snapshot freezes a view of a database at a moment in time — handy for quick rollbacks or comparing changes. A VM snapshot, on the other hand, captures the state of a virtual machine. Neither one replaces a backup that can be restored independently to a new server.

If you are snapshotting the entire server, how does that help you restore to a point in time for one database without bringing all of them backwards? That’s a really awkward conversation to have with the CEO.

Snapshots are shortcuts, not insurance policies.

Availability Groups Don’t Replace Backups

Even the shiniest Always On Availability Group can’t save you from data corruption, accidental deletes, or bad deployments that replicate instantly across every node.

Availability is not durability. You still need backups — tested, verified, and stored safely elsewhere.

An AG keeps you online, not immune.

Why It Gets Fuzzy

SQL Server grew up in an enterprise world, full of legacy features that overlap and evolve. Microsoft rarely deprecates things outright, so we’re left with a toolbox that has five ways to move data and ten ways to misunderstand them.

Don’t feel bad. SQL Server’s feature naming team must have had a bet going to see how many words they could reuse. These don’t help:

Replication sounds like redundancy.
Availability sounds like recovery.
Snapshots sound like backups.

That’s why it takes a DBA’s mindset. Understanding and translating these terms takes intentionality and years of exposure to them.

What can you do?

Inventory your protection layers. List what you’re using (replication, AGs, log shipping, backups) and define what each one really does.
Map to business needs. High availability ≠ disaster recovery ≠ data protection. You probably need all three, in different forms.
Test the ugly scenarios. Shut down primaries, take databases offline, pull power cords (in dev). You’ll learn fast which “solution” is just a hope.
Document the real RTO/RPO. Your CFO doesn’t care if it’s log shipping or AGs — they care how long you’ll be down and how much data you’ll lose.

The Bottom Line

Fuzzy thinking is normal, but it’s dangerous when disaster strikes.
Replication moves data. Log shipping recovers data. Availability Groups keep you online.
Only backups and testing keep you safe.

When you know what’s really protecting you, you can make confident, budget-friendly choices, not guesses wrapped in acronyms.

Reach out for a free basic Health Check

We can focus on your server’s Reliability and Availability.

SQL TidBit:

Want to see which databases are actually protected? This quick check shows the database name, recovery model, last full backup, and last log backup times:

-- This is code from the internet...test before you run
SELECT
    d.name AS [Database],
    d.recovery_model_desc AS [RecoveryModel],
    MAX(CASE WHEN bs.type = 'D' THEN bs.backup_finish_date END) AS [LastFullBackup],
    MAX(CASE WHEN bs.type = 'L' THEN bs.backup_finish_date END) AS [LastLogBackup]
FROM sys.databases d
LEFT JOIN msdb.dbo.backupset bs
    ON bs.database_name = d.name
WHERE d.name <> 'tempdb'
GROUP BY d.name, d.recovery_model_desc
ORDER BY d.name;

Note: If a database is in FULL recovery model but has no recent log backup, your log file is probably growing and could eventually cause a disk-full outage.

Link Party

Erin Stellato: “The Accidental DBA Day 23 of 30: SQL Server HADR Features”
Redgate / Simple-Talk: “SQL Server Backup and Restore for the Accidental DBA”
SQLShack: “Understanding Database Snapshots vs Database Backups in SQL Server”</li >
Brent Ozar: “Transactional Replication doesn’t solve your HA or DR pains”
Microsoft Docs: “Log Shipping and Replication (SQL Server)”

Please send to your manager to help clarify the differences in SQL Server HA, DR, and distributed computing. They deserve a clear cheat sheet.

SQL Server Availability Groups

September 10, 2025 by Kevin3NF Leave a Comment

Flexibility and Scale at the Database Level

When SQL Server 2012 introduced Availability Groups (AGs), they changed the HA/DR game. Unlike Failover Cluster Instances (FCIs), which protect the whole instance, AGs protect individual databases and let you scale reads, offload backups, and stretch across data centers.

They’re not perfect, but they’ve become the go-to HA/DR option for many SQL Server environments.

What Are Availability Groups?

An Availability Group is a set of one or more user databases that fail over together between SQL Server instances.

The Primary replica handles all writes.
Secondary replicas keep synchronized copies of the databases.
Clients connect through a Listener, a virtual network name similar to an FCI’s Virtual Name, but tied to the Group rather than the instance.

Core Benefits of AGs

Database-level protection: Choose which databases to include, rather than the entire instance.
Readable secondaries: Offload reporting, ETL, or read workloads.
Backup offloading: Run backups on secondary replicas to reduce load on the primary.
Geographic flexibility: Replicas can live across subnets or even in different data centers (Distributed AG)
Multiple copies of data: Protects against both server failure and storage corruption.

How AGs Differ from FCIs

Storage:
- AG: multiple independent copies of the data local to the replica)
- FCI: single shared storage, single point of failure.
Scope:
- AG: database-level protection only.
- FCI: instance-level protection (includes system DBs, jobs, logins, etc.).
Connectivity:
- AG: Listener provides a single name for apps, routes them to the current primary.
- FCI: Virtual Name tied to the whole instance.

Clusterless / Read-Scale AGs

Not every AG is about high availability.

Read-scale availability groups (also called clusterless AGs) provide read-only secondaries without requiring Windows Server Failover Clustering (WSFC).
They’re useful for:
- Scaling out reporting and analytics workloads.
- Offloading read traffic without the complexity of cluster configuration.
Caveats:
- No automatic failover.

If the primary fails, you must manually fail over.

- Purely for offloading reporting, backups (optional) and checkdb work.

Version & Edition Requirements

Enterprise Edition: Full AG support, multiple replicas, readable secondaries, async replicas across regions.
Standard Edition: Basic AGs (since SQL 2016 SP1), limited to 2 nodes, 1 database per AG, no read-scale.
- Minimally useful for one or two databases…more than that can cause thread starvation.
Windows / Linux:
- AGs require WSFC (Windows) or Pacemaker (Linux).
- Read-scale AGs don’t require clustering.

Downsides of AGs

Instance-level objects not included: Logins, SQL Agent jobs, linked servers must be synced separately.
Licensing cost: Enterprise Edition required for full features.
Operational complexity: Requires careful planning for quorum, replicas, failover modes, and monitoring.
Storage overhead: Each replica requires its own storage and capacity.

When to Choose AGs

Pick AGs if:
- You need readable secondaries for offloading.
- You need database-level granularity for HA/DR.
- You want cross-datacenter disaster recovery.
Pick FCIs if:
- You need instance-level protection.
- You have reliable shared storage.
- You want simpler management without replica sync overhead.

The Bottom Line:

Availability Groups offer flexibility that FCIs never could: readable secondaries, offloaded backups, and multi-datacenter options. But that power comes with complexity, higher licensing costs, and the need to manage instance-level objects separately.

If you need read scale or cross-site DR, AGs are often the right choice. If you just need your instance to stay alive, FCIs might still be enough.

Are you ready for a REAL database disaster?

Take the quiz to find out:

SQL Server Recovery Assessment | 2-Minute Quiz

SQL Tidbit:

AG’s come with a pretty handy, auto-refreshing dashboard in SSMS, complete with customizable columns for the metrics most important to you!

SQL Server Failover Cluster Instances

September 3, 2025 by Kevin3NF Leave a Comment

Old Reliable Still Matters

If you’ve been around SQL Server for a while, you’ve heard of Failover Cluster Instances (FCIs). They’ve been part of SQL’s high availability toolbox since long before Availability Groups showed up, and they’re still relevant today, especially if you want protection at the instance level, not just the database level.

Let’s break down what they are, how they differ from AGs, and where they make sense in your environment.

What Is an FCI?

A Failover Cluster Instance is a single SQL Server instance installed across multiple Windows Server nodes. At any given time, one node is active, while the others are standing by. If the active node fails, Windows Server Failover Clustering (WSFC) brings the SQL instance online on another node.

The big requirement? Shared storage. All nodes point to the same database files, whether that’s on a SAN, SMB file share, or Storage Spaces Direct.

How FCIs Differ from Availability Groups

Storage:
- FCI: one copy of the data on shared storage.
- AG: multiple synchronized copies on separate storage.
Scope:
- FCI: protects the whole instance (system databases, SQL Agent jobs, logins, SSIS packages, everything).
- AG: protects only selected user databases.
Client Connections:
- FCI: uses a Virtual Network Name (VNN). Apps connect to a single name, and after failover, that name points to the new active node.
- AG: uses an AG Listener, which looks and behaves similarly, but is tied to a group of databases instead of the entire instance.
Use Cases:
- FCI: protects against server or instance failure.
- AG: adds database-level resilience, readable secondaries, and cross-datacenter flexibility (Distributed AG)

FCI Version & Edition Requirements

Windows Server: Requires WSFC.
SQL Server Editions:
- Standard: limited to 2 nodes.
- Enterprise: supports multiple nodes, multi-subnet clusters, and advanced options.
Storage: Must be shared, unless you’re on newer Windows with Storage Spaces Direct.

Benefits of FCIs

Instance-level protection: Everything tied to SQL Server moves during failover.
Automatic failover: Fast recovery from node crashes, no manual intervention required.
No data divergence: One data copy = no sync overhead.
Connection simplicity: The Virtual Network Name hides node-level details from applications.

Downsides & Risks

Shared storage dependency: If the SAN or file share fails, the cluster fails.
No readable secondaries: Unlike AGs, you can’t offload reporting or backups.
Longer failover times: The whole instance must restart on the new node.
Complex setup: Requires WSFC expertise, quorum planning, and strong storage design.

When to Choose FCI vs AG

Pick FCI if:
- You want full-instance protection, including jobs, logins, and system databases.
- You already have a reliable SAN or shared storage.
- You need a single connection name (VNN) that hides failovers from applications.
Pick AG if:
- You want multiple data copies for extra resilience.
- You need readable secondaries for reporting or backups.
- You want DR across data centers without shared storage.
- You have personnel to maintain the underlying WSFC components
  - Clusterless / Read-Scale AGs are an option

The Bottom Line

Failover Cluster Instances aren’t flashy, but they’re still solid. They protect more than just databases, they’re available in Standard Edition, and for many shops they’re the right balance of reliability and simplicity.

Just remember: your cluster is only as good as your storage.

New Pocket DBA® clients get the first month FREE!

https://pocket-dba.com/

Book a call, and mention “Newsletter”

SQL Tidbit

When connecting to an FCI, always use the Virtual Network Name (VNN) — never the node name. This way, your applications don’t care which node is currently active. Failover happens, your apps keep talking, and you don’t get that 3 a.m. call.

Thanks for reading!

SQL Server Log Shipping: The Tried and True of DR

August 27, 2025 by Kevin3NF Leave a Comment

It’s not glamorous, but it works

In a world where shiny new HA/DR features get all the press, there’s one SQL Server technology that just keeps doing its job.

Log Shipping has been around since SQL Server 2000. It doesn’t make headlines, it doesn’t have fancy dashboards, and it’s not going to win you any architecture awards. But for the right environment and use case, it’s rock solid and can save your bacon job in a disaster.

Backup, copy, retore. With Automation.

What is Log Shipping?

At its core, log shipping is an automated process that:

Backs up the transaction log from your primary database.
Copies that backup to one or more secondary servers.
Restores the log onto the secondary.

Do that on a schedule (15 minutes is the default) and your secondary database stays nearly up to date with production.

Requirements:

Database in FULL recovery model
SQL Agent running (Log Shipping is powered by jobs)
Shared/network path for moving the log backups

SQL Server Versions & Editions

Works in Standard, Enterprise, and Developer Editions — not in Express.
Available since SQL Server 2000, still here in 2022.
Works in on-premises or cloud/colo VM deployments (Azure SQL VM, AWS EC2, etc.).

Why It’s Disaster Recovery, Not High Availability

HA means automatic failover with near-zero downtime. That’s not log shipping.

With log shipping:

Failover is manual – you bring the secondary online when needed, database by database
RPO (data loss) is whatever gap exists between the last backed up log and the failure.
RTO (downtime) is how long it takes to restore the last backups and bring the database online.

It’s DR because you can be up in minutes to hours, but not instantly.

Setting Up Log Shipping (The Basics)

You can walk through the wizard in SSMS, and this really the easiest way in my opinion:

Primary server: Configure the log backup job.
Secondary server: Configure the copy and restore jobs.
Schedule jobs: As a general rule I backup frequently, copy frequently (if using the copy at all) and build in a delay on the restore side. This helps give you a window to catch an “oops” moment, rather than restoring the oops to the secondary server.
Initial restore: Seed the secondary with a full backup (WITH NORECOVERY).

Monitoring & Maintenance

Log Shipping has some built-in monitoring functionality:

Instance level report: Server properties>>Reports>>Standard Reports>>Transaction Log Shipping Status
Alert jobs created for thresholds not being met. You may have to adjust these for databases that can go for long periods of time with no activity to avoid false alerts.
Set up alerts for job failures on the Backup, copy and restore jobs. If you backup to and restore from the same file location you don’t need the copy job.

Failover & Failback

Failover (manual):

Stop the log shipping jobs.
Restore any remaining logs.
Bring the secondary online (RESTORE DATABASE myDatabase WITH RECOVERY).

Failback:

Rebuild the original setup from scratch. This will leave you with copies of all of the jobs on each server in a 2 node setup. Make sure to disable the jobs that don’t make sense, such as LS_restore on the Primary.

When to Use Log Shipping

Low budget DR solution
Simple read-only reporting copy (standby mode)
Decent for setting up a reporting server when near real-time data isn’t a requirement.

The Bottom line:

Log Shipping isn’t shiny, but it’s dependable. If you can live with a few minutes of potential data loss and a manual failover, it’s a cost-effective way to add resilience to your SQL Server environment.

New Pocket DBA® clients get the first month FREE!

https://pocket-dba.com/

Book a call, and mention “Newsletter”

Thanks for reading!

— Kevin

Why Your SQL Server Backups Aren’t Enough

April 9, 2025 by Kevin3NF Leave a Comment

SQL Server backups are database insurance – you don’t really appreciate them until disaster strikes. But here’s the kicker: having backups doesn’t mean your data is safe. If you’re not regularly doing test restores, your backup strategy is a ticking time bomb.

The False Sense of Security

Many IT leaders and system admins think, “We have full backups every night. We’re covered.” But when the time comes to restore, they discover:

· The backup file is corrupt.
· The storage location is inaccessible.
· The restore process takes way longer than expected.
· The recovery model wasn’t configured properly.
· The point-in-time restore doesn’t actually bring back the data they need.

At that point, it’s not a “backup strategy.” It’s a data loss incident.

Why Restores Must Be Tested

SQL Server backups are only as good as your ability to restore them. If you’re not proactively testing restores, you’re just hoping they work when disaster strikes. Hope is not a strategy.

• Test different restore scenarios: Full, differential, and transaction log restores.
• Validate integrity with DBCC CHECKDB after restores.
• Ensure RTO and RPO align with business needs. (Translation: Can you restore fast enough to avoid a business meltdown?)
• Automate restore tests so you’re not guessing when the pressure is on.

Automating Restore Tests with dbatools

If you’re not testing restores because it’s “too time-consuming,” dbatools.io has your back. The Test-DbaLastBackup command automates the process by:

1. Restoring the last backup of a database to a test environment.
2. Running DBCC CHECKDB to ensure data integrity.
3. Reporting on success or failure, so you know where issues exist before you actually need the backup.

Example Command:
Test-DbaLastBackup -SqlInstance YourSQLServer -Databases YourDatabase

This takes a lot of manual effort out of the equation—no more excuses for not testing!

Action Plan for SQL Server Resilience

1. Schedule regular test restores (weekly or monthly) and document the process.
2. Use DBCC CHECKDB to verify database integrity post-restore.
3. Leverage Test-DbaLastBackup to automate restore verification.
4. Confirm backup files are accessible and stored securely offsite.
5. Train your team on recovery processes—because backups are useless if no one knows how to restore them.
6. Monitor backup success, but more importantly, track restore success.

The Bottom Line:

A backup plan without restore testing is a disaster waiting to happen. Don’t wait until you’re knee-deep in downtime to realize your backups weren’t as solid as you thought. Test your restores. Validate your data. Sleep better at night.

Thanks for reading!

–Kevin

New Pocket DBA® clients get the first month FREE!

Get a Free SQL Server Health Check!

SQL Server Backups are Not a High-Availability Solution

October 17, 2023 by Kevin3NF Leave a Comment

Please continue doing your backups!

Backups are Disaster Recovery, yes…but not HA.

Some will argue with this (in the comments most likely), but I broadly define “High Availability” as a system that can recover in seconds or minutes at most. Sometimes that is automatic, sometimes manual.

Backups might be quick to restore IF they are small enough and the right people are in place (not at lunch or otherwise out). But automated restores to prod just aren’t a thing.

SQL Server has this cute little marketing term called “Always On” which is nonsense. Always? Really? 9 Nines?

Always On covers both Failover Cluster Instances and Availability Groups. There are significant differences between the two. Both depend on the O/S Cluster…and they diverge a LOT from there. They are both HA.

Log Shipping (ancient tech) is great for DR and hitting your RPO number, but failover to a log shipped copy is manual.

Replication is not and never will be an HA or DR solution. Some things cannot be replicated, so they are lost if the publication database goes poof.

There are of course things outside of SQL Server that can help you hit your RPO/RTO goals. Feel free to share them.

What are you using for your HA solution?

Thanks for reading!

Kevin3NF

Follow @Dallas_DBAs