syndicated

SQL Server Failover Cluster Instances

September 3, 2025 by Kevin3NF Leave a Comment

Old Reliable Still Matters

If you’ve been around SQL Server for a while, you’ve heard of Failover Cluster Instances (FCIs). They’ve been part of SQL’s high availability toolbox since long before Availability Groups showed up, and they’re still relevant today, especially if you want protection at the instance level, not just the database level.

Let’s break down what they are, how they differ from AGs, and where they make sense in your environment.

What Is an FCI?

A Failover Cluster Instance is a single SQL Server instance installed across multiple Windows Server nodes. At any given time, one node is active, while the others are standing by. If the active node fails, Windows Server Failover Clustering (WSFC) brings the SQL instance online on another node.

The big requirement? Shared storage. All nodes point to the same database files, whether that’s on a SAN, SMB file share, or Storage Spaces Direct.

How FCIs Differ from Availability Groups

Storage:
- FCI: one copy of the data on shared storage.
- AG: multiple synchronized copies on separate storage.
Scope:
- FCI: protects the whole instance (system databases, SQL Agent jobs, logins, SSIS packages, everything).
- AG: protects only selected user databases.
Client Connections:
- FCI: uses a Virtual Network Name (VNN). Apps connect to a single name, and after failover, that name points to the new active node.
- AG: uses an AG Listener, which looks and behaves similarly, but is tied to a group of databases instead of the entire instance.
Use Cases:
- FCI: protects against server or instance failure.
- AG: adds database-level resilience, readable secondaries, and cross-datacenter flexibility (Distributed AG)

FCI Version & Edition Requirements

Windows Server: Requires WSFC.
SQL Server Editions:
- Standard: limited to 2 nodes.
- Enterprise: supports multiple nodes, multi-subnet clusters, and advanced options.
Storage: Must be shared, unless you’re on newer Windows with Storage Spaces Direct.

Benefits of FCIs

Instance-level protection: Everything tied to SQL Server moves during failover.
Automatic failover: Fast recovery from node crashes, no manual intervention required.
No data divergence: One data copy = no sync overhead.
Connection simplicity: The Virtual Network Name hides node-level details from applications.

Downsides & Risks

Shared storage dependency: If the SAN or file share fails, the cluster fails.
No readable secondaries: Unlike AGs, you can’t offload reporting or backups.
Longer failover times: The whole instance must restart on the new node.
Complex setup: Requires WSFC expertise, quorum planning, and strong storage design.

When to Choose FCI vs AG

Pick FCI if:
- You want full-instance protection, including jobs, logins, and system databases.
- You already have a reliable SAN or shared storage.
- You need a single connection name (VNN) that hides failovers from applications.
Pick AG if:
- You want multiple data copies for extra resilience.
- You need readable secondaries for reporting or backups.
- You want DR across data centers without shared storage.
- You have personnel to maintain the underlying WSFC components
  - Clusterless / Read-Scale AGs are an option

The Bottom Line

Failover Cluster Instances aren’t flashy, but they’re still solid. They protect more than just databases, they’re available in Standard Edition, and for many shops they’re the right balance of reliability and simplicity.

Just remember: your cluster is only as good as your storage.

New Pocket DBA® clients get the first month FREE!

https://pocket-dba.com/

Book a call, and mention “Newsletter”

SQL Tidbit

When connecting to an FCI, always use the Virtual Network Name (VNN) — never the node name. This way, your applications don’t care which node is currently active. Failover happens, your apps keep talking, and you don’t get that 3 a.m. call.

Thanks for reading!

SQL Server Log Shipping: The Tried and True of DR

August 27, 2025 by Kevin3NF Leave a Comment

It’s not glamorous, but it works

In a world where shiny new HA/DR features get all the press, there’s one SQL Server technology that just keeps doing its job.

Log Shipping has been around since SQL Server 2000. It doesn’t make headlines, it doesn’t have fancy dashboards, and it’s not going to win you any architecture awards. But for the right environment and use case, it’s rock solid and can save your bacon job in a disaster.

Backup, copy, retore. With Automation.

What is Log Shipping?

At its core, log shipping is an automated process that:

Backs up the transaction log from your primary database.
Copies that backup to one or more secondary servers.
Restores the log onto the secondary.

Do that on a schedule (15 minutes is the default) and your secondary database stays nearly up to date with production.

Requirements:

Database in FULL recovery model
SQL Agent running (Log Shipping is powered by jobs)
Shared/network path for moving the log backups

SQL Server Versions & Editions

Works in Standard, Enterprise, and Developer Editions — not in Express.
Available since SQL Server 2000, still here in 2022.
Works in on-premises or cloud/colo VM deployments (Azure SQL VM, AWS EC2, etc.).

Why It’s Disaster Recovery, Not High Availability

HA means automatic failover with near-zero downtime. That’s not log shipping.

With log shipping:

Failover is manual – you bring the secondary online when needed, database by database
RPO (data loss) is whatever gap exists between the last backed up log and the failure.
RTO (downtime) is how long it takes to restore the last backups and bring the database online.

It’s DR because you can be up in minutes to hours, but not instantly.

Setting Up Log Shipping (The Basics)

You can walk through the wizard in SSMS, and this really the easiest way in my opinion:

Primary server: Configure the log backup job.
Secondary server: Configure the copy and restore jobs.
Schedule jobs: As a general rule I backup frequently, copy frequently (if using the copy at all) and build in a delay on the restore side. This helps give you a window to catch an “oops” moment, rather than restoring the oops to the secondary server.
Initial restore: Seed the secondary with a full backup (WITH NORECOVERY).

Monitoring & Maintenance

Log Shipping has some built-in monitoring functionality:

Instance level report: Server properties>>Reports>>Standard Reports>>Transaction Log Shipping Status
Alert jobs created for thresholds not being met. You may have to adjust these for databases that can go for long periods of time with no activity to avoid false alerts.
Set up alerts for job failures on the Backup, copy and restore jobs. If you backup to and restore from the same file location you don’t need the copy job.

Failover & Failback

Failover (manual):

Stop the log shipping jobs.
Restore any remaining logs.
Bring the secondary online (RESTORE DATABASE myDatabase WITH RECOVERY).

Failback:

Rebuild the original setup from scratch. This will leave you with copies of all of the jobs on each server in a 2 node setup. Make sure to disable the jobs that don’t make sense, such as LS_restore on the Primary.

When to Use Log Shipping

Low budget DR solution
Simple read-only reporting copy (standby mode)
Decent for setting up a reporting server when near real-time data isn’t a requirement.

The Bottom line:

Log Shipping isn’t shiny, but it’s dependable. If you can live with a few minutes of potential data loss and a manual failover, it’s a cost-effective way to add resilience to your SQL Server environment.

New Pocket DBA® clients get the first month FREE!

https://pocket-dba.com/

Book a call, and mention “Newsletter”

Thanks for reading!

— Kevin

SQL Server Backups: The Basics

August 20, 2025 by Kevin3NF Leave a Comment

If you’re responsible for a SQL Server instance, you need working, consistent backups. Not just a .bak file here and there, but a plan that runs automatically and covers the full recovery cycle.

Here’s how to get that in place, even if you’re not a DBA.

Understand What Each Backup Type Does:

You don’t need them all every time, but you do need to know what they’re for:

Full Backup
A complete copy of the entire database at that moment in time. It’s your foundation.
Differential Backup
Captures only what changed since the last full These can help speed up recovery time and reduce storage needs. Not really necessary if your databases are small.
Transaction Log Backup
Captures everything written to the transaction log since the last log backup. Needed for point-in-time recovery.
- If your database is in Full or Bulk-Logged recovery model and you’re not doing log backups, your log file will grow endlessly, potentially filling the drive it is on.

Set a Backup Schedule That Works

For production databases, this is my minimum recommended setup:

Full backups once per day
Log backups every 5 to 15 minutes
Optional differentials every few hours for large databases

For dev/test databases:

Full backups daily or weekly are usually fine
You can skip log backups unless you’re testing recovery processes
- If you are going to skip, set the databases to SIMPLE Recovery

Automate the Backups

Use SQL Server Agent to schedule the jobs. Here are two options:

Maintenance Plans (basic, GUI-driven)
- Good for smaller environments or shops without scripting experience
- Be careful, default plans may not have the best options for your situation
- Included in SQL Server, supported by Microsoft.
Ola Hallengren’s Maintenance Solution (highly recommended)
- Free, open-source, script-based
- Handles full/diff/log backup rotation, cleanup, logging, and more
  - Optionally does corruption checking and index/stats maintenance
- Use SQL Agent to schedule the process via the jobs the script created
- Free, FAQ/email/community support, but not Microsoft

Store Backups Somewhere Safe

Don’t store them on the same drive as the database files. If the drive dies, the data and backups may both be lost

Better options:

Separate disk or volume
Network share
Azure Blob Storage or S3, via Backup to URL option

Monitor It

Make sure the backup jobs are:

Running successfully
Completing on time
Not overwriting too soon or growing endlessly

Use SQL Agent alerts, third-party tools, or scripts to monitor backup age and job success.

The Bottom Line:

Understanding the basics of what backups are, and how they work is KEY to protecting your company’s most valuable asset. If you don’t know how this works and it is your responsibility, a database failure without a backup could be a career limiting move.

New Pocket DBA® clients get the first month FREE!

https://pocket-dba.com/

Book a call, and mention “Newsletter”

Thanks for reading!

— Kevin

DBCC CHECKDB: Just Because It’s Quiet Doesn’t Mean It’s Safe

August 13, 2025 by Kevin3NF Leave a Comment

Corruption isn’t a “maybe someday” problem – what you need to do now.

Stop. Don’t panic.

You just ran DBCC CHECKDB for the first time in a while (or maybe ever) and saw something you didn’t expect: the word corruption.

Take a breath.

Don’t detach the database.
Don’t run REPAIR_ALLOW_DATA_LOSS.
Don’t reboot the server or start restoring things just yet.

There’s a lot of bad advice floating around from old blogs, well-meaning forum posts, and even some popular current LinkedIn threads. Some of it might’ve been okay 15 years ago. Some of it is dangerous.

Let’s dig in.

What Corruption Really Means

When SQL Server says there’s corruption, it’s not talking about “bad data” like wrong numbers or missing values. It means it found internal structures that are damaged. The kind that can cause queries or even make your database unusable.

This could be:

Broken data or index pages
Allocation inconsistencies (GAM, SGAM, PFS pages)
Corrupt system metadata
Problems in the transaction log.

This isn’t a performance problem.
It’s a data integrity problem. If left untreated, it can get worse.

How Does Corruption Happen?

Even if your server is well-configured, corruption can still creep in. Common causes include:

Failing disks or controllers (especially SANs and older SSDs)
Disk subsystems lying about successful writes
Power outages or hard shutdowns.
Sometimes SQL Server itself has bugs that cause corruption – especially in RTM versions.
Snapshot or backup software interfering at the file level
Antivirus software scanning .mdf, .ldf, or .ndf files directly

Some of these things leave no obvious signs. This is why running CHECKDB regularly is so important.

What DBCC CHECKDB Actually Does

When you run DBCC CHECKDB, SQL Server performs a deep consistency check of your database:

Every table, every index, every system structure
Logical and physical page consistency
Allocation integrity

If possible, SQL uses a snapshot to avoid locking the database.

What it doesn’t do:

Fix anything (unless you tell it to)
Prevent corruption
Run automatically (unless you set it up)

How Often Should You Run It?

Ideally: once per week, at minimum.

Schedule it in a SQL Agent job, off-hours.
Save the job output to file or table so you don’t miss warnings.
Set up an email alert for failures of this job (as well as corruption alerts for error 823-825)

If CHECKDB takes too long or hits your performance too hard, you can offload the work.

Offload CHECKDB with Test-DbaLastBackup

If you’re taking backups regularly (you are, right?), you can use Test-DbaLastBackup from the dbatools.io PowerShell module to verify database consistency (and restorability) without touching production.

This command:

Restores your most recent backup to another SQL instance
Runs DBCC CHECKDB against the restored copy
Confirms both restorable state and internal consistency

Test-DbaLastBackup -SqlInstance "TestRestoreSQL" -Destination "TestRestoreSQL" -Database "YourDatabase"

It’s a great way to validate backups and run CHECKDB in a lower-impact environment.
Not a replacement for CHECKDB in production, but a powerful supplement when time or resources are tight.

Consider running CHECKDB on a secondary replica if you’re using Availability Groups.
If CHECKDB fails due to size or takes too long, it’s even more important to find time and a strategy that works.

What to Do If You Find Corruption

Read the output carefully.
It tells you which object is affected and how.
Run CHECKDB again to confirm.
Temporary issues can happen, especially on shared storage.
Do not detach the database.
Doing so loses the ability to investigate further.
Check your backups.
Can you restore from before the corruption appeared? This is the first thing Microsoft will tell you when you call support.
If you are really lucky the corruption might be in a non-clustered index, and dropping/recreating that index may solve it for now.

Still stuck?

Read this from Brent Ozar: DBCC CHECKDB Reports Corruption? Here’s What to Do

About REPAIR_ALLOW_DATA_LOSS

That command does exactly what it says: it removes damaged pages and objects to make the database consistent again—even if that means losing real data.

Use it only when:

You have no usable backup
You’ve consulted with your team and accepted the risk (get that in writing from your manager/CTO)
You’ve tried every other recovery option

If you’re not 100% sure what it’s going to delete (if anything), you’re not ready to run it. This is the sort of thing that can get you fired. So is not having backups.

How to Check When DBCC CHECKDB Was Last Run

This script gives you the last successful run for each database:

SELECT 
    name AS DatabaseName,
    DATABASEPROPERTYEX(name, 'LastGoodCheckDbTime') AS LastCheckDBSuccess
FROM 
    sys.databases
WHERE 
    state_desc = 'ONLINE'
ORDER BY 
    LastCheckDBSuccess DESC;

If the date is blank, it has never been run.

The Bottom Line

Corruption doesn’t announce itself with a trumpet. You only know it’s there if you go looking.

CHECKDB gives you an early warning. It’s not glamorous, but it’s essential, especially in environments without a dedicated DBA watching for signs of trouble.

If you’re not running it, you’re flying blind.

If you don’t know what to do when it finds something, now’s the time to prepare.

Don’t panic. But don’t ignore it either.

Thanks for reading!

— Kevin

SQL Server I/O Bottlenecks: It’s Not Always the Disk’s Fault

August 6, 2025 by Kevin3NF Leave a Comment

“SQL Server is slow.”

We’ve all heard it. But that doesn’t always mean SQL Server is the problem. And “slow” means nothing without context and ability to verify.

More often than you’d think, poor performance is rooted in the one thing most sysadmins don’t touch until it’s on fire: the disk subsystem.

Why I/O Bottlenecks Fly Under the Radar

Many IT teams blame queries, blocking, or missing indexes when performance tanks, and sometimes they’re right. But if you’re seeing symptoms like long wait times, timeouts, or sluggish backups, there’s a good chance the underlying storage is at fault. I’ve rarely seen a storage admin agree with this at the onset of the problem, so you need to do the work up front.

Unless you look for I/O issues, you might never find them.

Common Causes of SQL Server I/O Bottlenecks

Slow or oversubscribed storage
Spinning disks, congested SANs, or underpowered SSDs can’t keep up with demand.
Outdated or faulty drivers
We’ve seen HBA or RAID controller driver issues that looked like database bugs.
Auto-growths triggered during business hours
Small filegrowth settings lead to frequent stalls. Instant File Initialization helps this. If you cannot use IFI, manually grow your data files off-hours.
Bad indexing or bloated tables
Too much data read, written, and maintained.
Unused indexes
Every insert, update, or delete has to update them, whether they’re used or not. This one is a killer. My script is based one my friend Pinal Dave wrote many years ago.
Data, log, and tempDB all sharing a volume
A recipe for write contention and checkpoint stalls. The more separation you can do, the better. If everything is going through one controller, this might not help, especially in a VMWare virtual controller configuration.
VM storage contention or thin provisioning
Your VM’s dedicated storage might not be as dedicated as you think. Check with your admin to see if VMs have moved around and you are now in a “noisy neighbor” situation.

What Do “Good” Disk Numbers Look Like?

If you’re not sure what “normal” looks like for your disks, here are some rough benchmarks:

You can get these numbers using:

sys.dm_io_virtual_file_stats
Performance Monitor (Avg. Disk sec/Read, Disk Queue Length)
Disk benchmarking tools like CrystalDiskMark (local test environments)
Resource Monitor>>Disk tab is a quick and easy way to see visually what the disks are spinning time on, if you are on the server.

Fixes and Workarounds

Identify and reduce high physical reads
These indicate SQL Server is constantly pulling data from disk, which could be caused by poor indexing, insufficient memory, or queries reading too much data. sp_BlitzCache from Ozar can help with this. Use @SortOrder = ‘reads’ or ‘avg reads’. Sp_whoisactive can help if the issue is ongoing.
Tune queries with high reads reads
Even if a query runs from memory, it can churn the buffer pool and evict useful pages, leading to other queries hitting disk more often.
Set reasonable autogrowth sizes
Growing in 1MB chunks? That’s going to hurt. Aim for larger, consistent growth settings, especially for TempDB and transaction logs.
Move files to better storage
Separate data, logs, TempDB, and backups if possible. SSDs or NVMe where it counts.
Clean up unused indexes
If they’re not used for reads, they’re just extra write overhead. Especially your audit and logging tables that rarely get queried.
Keep your drivers and firmware current
Storage vendors quietly fix performance bugs all the time.
Monitor your VM host’s disk utilization
Especially in shared environments. Noisy neighbors can take you down.

The Bottom Line:

SQL Server does a lot of things right, but it can’t make slow storage go faster. Verify the storage is the likely culprit before you go yell at the storage admin.

Before you throw more CPU or memory at a problem, take a closer look at your I/O path. You might just find the real bottleneck isn’t SQL Server at all.

Thanks for reading!

— Kevin

SQL Server Maintenance Plans

July 30, 2025 by Kevin3NF Leave a Comment

If you’re a DBA, sysadmin, IT manager, or Accidental DBA, you’ve probably seen SQL Server’s built-in Maintenance Plans. They live right there in SSMS under the “Management” node, quietly offering to take care of your backups, index maintenance, integrity checks, random T-SQL tasks and more.

They look simple. They are simple. But that doesn’t mean they’re always the best solution.

What Maintenance Plans Can Do

Microsoft added Maintenance Plans to make basic tasks like backups accessible, especially in environments without a dedicated DBA.
The wizard-driven interface lets you:

Schedule Full, Differential, and Transaction Log backups
Perform index maintenance
Run DBCC CHECKDB
Execute basic cleanup tasks
Run T-SQL commands as part of the “flow”

And it all runs under SQL Server Agent so you can automate with just a few clicks.

What Maintenance Plans Can’t Do Well

Ease of use comes at the cost of flexibility.

Here’s where they fall short:

Limited control: You can’t fine-tune logic or dynamically skip steps based on conditions. Not without a lot of fiddling around in the SSIS canvas at least
LOTS of clicking, dragging, dropping, Googling, etc. if you are new to MPs. The Wizard will make some basic decisions for you.
Logging is basic: Failures often go unnoticed unless you’re checking manually. If a MP job fails, the reason is in the MP history, not the job history. Makes perfect sense.
Weird defaults: If you choose to create an index rebuild plan, it defaults to 30% or more fragmentation, and 1000 PAGES, that’s a LOT of time spent on teeny tiny 8MB indexes. Unless a page isn’t 8KB anymore.

If you’re working in a mission-critical or highly regulated environment, these gaps can cause trouble.

They’re Not Useless

Don’t get me wrong. Maintenance Plans have their place.

Especially if you’re:

Running one SQL Server instance with a couple of databases
Trying to get any backups in place after years of neglect. Any backup is better than no backup…but that’s a different post
Buying time until a better strategy is in place

Step-by-Step: How to Create a Full Backup Maintenance Plan

Let’s walk through the simplest case: backing up all user databases once a day.

Launch the Wizard

In SSMS, expand Management
Right-click Maintenance Plans
Choose Maintenance Plan Wizard

Name & Schedule the Plan

Click Next on the welcome screen
Name your plan (e.g., Nightly Full Backup)
Choose Single schedule for the entire plan
Click Change to set the schedule:
- Frequency: Daily
- Time: 2:00 AM (or another low-traffic time)
- Recurs every: 1 day
Click OK, then Next

Choose Task Type

Check only Back Up Database (Full) → Next

Configure Backup Task

Databases: Select All user databases (or hand-pick)
Backup to: Disk → Choose or create a folder (e.g., D:\SQLBackups\)
- URL is an option, for cloud storage.
Optional:
- Create a sub-directory per database
- Set backup expiration
- Enable checksum
Click Next

Reporting (Optional)

Save report to a text file or enable email notifications
- The default is the same directory your SQL ERRORLOGs are living in.

Finish

Review the summary
Click Finish to create and schedule the plan

Done. Backups will now run on schedule, and you’ve taken a first step.

But now you need to repeat that process for all the other maintenance tasks (Log backups, stats maintenance, CheckDB, etc.)

There’s a Better Way

Once you’re past the basics, most SQL Server professionals recommend moving on from Maintenance Plans. Here’s what they use:

Ola Hallengren’s Maintenance Solution

Free, flexible, and widely used in the SQL community.

Modular design
Intelligent scheduling
Excellent logging
Works with the SQL Agent
VERY simple setup. Please run this against a ‘DBA’ database, not master or msdb.

SQL Server Agent Jobs with Custom T-SQL

More setup time, but gives you full control over backup paths, logging, and error handling.

Third-Party Tools

If budget allows, options like Redgate SQL Backup or Idera SQL Safe Backup can offer robust UIs, centralized management, and alerts.

The Bottom Line

Maintenance Plans are training wheels.

They’ll get you moving, but they’re not built for high-speed, high-traffic, or high-stakes environments.

If you’re serious about protecting your data, build a better backup strategy. But if you’re just getting started and need a win?