Emergency & Troubleshooting

“SQL Server Is Slow” part 1 of 4

September 17, 2025 by Kevin3NF Leave a Comment

How should you respond when you get the dreaded Email/Slack/Text/DriveBy from someone yelling at you that SQL Server is slow?

Stop. Don’t Open SSMS Yet.

You’ve heard it before: “The server is slow.”

What does that actually mean?

If you jump straight into SQL Server looking for blocking, bad queries, or a missing index, you’re working with bad input. And bad input = wasted time.

The real work starts before you ever connect to SQL Server.

Intake & Scoping Comes First

Your first job is to turn “slow” into something measurable and time-bound. That means gathering evidence, building a timeline, and classifying the type of incident.

Translate “Slow” Into Scenarios

“Slow” can mean very different things depending on who you ask:

Login delays – Applications take 30 seconds to connect.
Application lag – a report that usually runs in 2 seconds now takes 30.
Broad latency – all users complain at once.
Maintenance overruns – ETL jobs not finishing in their normal window.
Timeouts or errors – not just lag, but outright failures.

Each one leads you in a different direction.

The Fast Intake Questionnaire

Ask these every single time:

What exactly was slow? (screen/report/job/login)
When did it start? (date/time, not “earlier today”)
How widespread? (one user, team, region, everyone)
Repeatable? (every time, sometimes, once)
What changed? (deploys, patches, config, new data)
Is it happening now? (yes/no)
Evidence (screenshots, error messages, job names with timestamps)

Classify the Incident

Once you have real answers, put it in a bucket:

Happening now – live triage needed.
Recurring/pattern – happens on a schedule.
Sporadic – random spikes.
One-time – tied to a patch, deploy, or outage.

Build the Timeline

Anchor events with real timestamps.

“Around lunch” = useless.
“9:12 AM CST, FinanceApp login timeout” = actionable.

Capture start, stop, duration, who was affected, and what else was happening on the system at that time.

The (first) Bottom Line:

Action without information is useless. Any other service provider (ER, plumber, Electrician, auto mechanic) will ALWAYS triage your issues before starting an investigation or fix. You should too.

SQL Tidbit:

When gathering timelines, align your clocks. SQL Server may log in UTC while users report in Central or Eastern time. A “9 AM slowdown” can easily show up in SQL logs at 2 PM. Get this wrong and you’ll chase the wrong evidence.

Thanks for reading!

Your SQL Server Is Whispering…Are You Listening?

May 7, 2025 by Kevin3NF Leave a Comment

Early Warnings of a Meltdown (and How to Catch Them)

Most SQL Server crashes don’t come out of nowhere.
They leave breadcrumbs – red flags that something’s not right. The problem? If you don’t know where to look, you miss the signs…until it’s 2am and your CEO’s calling.

Let’s talk about how to listen for those whispers before they turn into full-blown alarms.

The Canary in the Query Mine

Here are a few subtle warning signs I see often before a server falls over:

Backup jobs start taking longer for no obvious reason.
TempDB grows larger than usual
Login failures spike
High CPU or long-running queries suddenly become “normal.”
End users QUIT complaining…because they’ve gotten used to the problem.
DBCC CHECKDB gets skipped because “it blocks things”

If any of these ring a bell, your SQL Server might be trying to tell you something.

Logs You Shouldn’t Ignore

Quick reality check: Most people never check the logs until the server is on fire, or after

SQL Server Errorlog:
- Corruption messages (Severity 24 and 25).
  - Run this: Select * from msdb.dbo.suspect_pages
- Failed login attacks (possible brute force or misconfigurations).
  - Some of these might be from vulnerability testing tools and can be ignored. Might.
- IO warnings that your storage is too slow or unreliable.
- How to find the SQL Server ErrorLog on disk
Agent job failures that are just quietly failing in the background.
Windows Event logs
Cluster logs

Pro tip: Set a calendar reminder to review logs once a week. Seriously—it takes 5 minutes and could save your bacon.

DIY Monitoring: No Budget? No Excuse.

You don’t need fancy tools to catch problems early. Here are a few dead-simple tricks:

SQL Agent job that checks disk space and emails you when it drops below 10%.
Query to track the longest-running queries over time—schedule it to run daily.
Custom SQL alerts for CPU over 90% for more than X minutes.

Set Up Built-In SQL Alerts (Takes 5 Minutes)

SQL Server has a native Alerting feature. No third-party tools required.

Here’s how to set up critical alerts for things you never want to miss:

--Step 1:

EXEC msdb.dbo.sp_add_operator 
    @name = N'DBA_Alerts', 
    @enabled = 1, 
    @email_address = N'[email protected]';
-- This requires Database Mail and an SMTP server

--Step 2
-- Severity 19-25 (from “Something’s wrong” to “We’re toast”)
-- Change and run for each severity you want to track. Sev 20 may give false positives

EXEC msdb.dbo.sp_add_alert 
    @name = N'Severity 19', 
    @message_id = 0, 
    @severity = 19, 
    @notification_message = N'Severity 19 error occurred', 
    @job_id = NULL, 
    @enabled = 1, 
    @delay_between_responses = 300, 
    @include_event_description_in = 1, 
    @database_name = N'master', 
    @operator_name = N'DBA_Alerts', 
    @notification_method = 1;

Repeat this for severity levels 19 to 25, or use a script to loop through them.
Bonus: Create alerts for SQL error numbers like:

823, 824, 825 – signs of potential data corruption
9002 – log file full

The Bottom Line

SQL Server doesn’t generally go down without a fight. But it does give off signals.
Learn to listen, and you’ll dramatically reduce those “why is the website down?”.

Database Emergencies Cost Thousands Per Minute

When SQL Server fails, you need expert help fast.
For a limited time: First month of Pocket DBA® FREE with a 6-month plan.

• Senior DBAs available 24/7 • 1-hour SLA • Proactive problem prevention

Don’t wait for a disaster. Be prepared.
CLAIM FREE MONTH
Offer ends June 1st. Limited availability.

Thanks for reading!

–Kevin

The Cost of Not Having a DBA

April 2, 2025 by Kevin3NF Leave a Comment

Does skipping a DBA save money? Wait until your system grinds to a halt, your backups fail, or your cloud bill skyrockets.

The “DBA? We Don’t Need No Stinkin’ DBA” Mindset

Common Excuses That Lead to Expensive Problems:

“Our system runs fine,”
“The cloud takes care of it”
“Our sysadmin is really smart”
“DBAs are expensive”

SQL Server is an Enterprise level database product with so many facets to it you could make 5 different careers developing and supporting it (DBA, Developer, BI Stack, Analyst and more). Terabytes of data, thousands of transactions per second.

I have seen repeated instances of someone not fully trained as a DBA reaching out to Google (faster than vendor support) for answers only to find a 15-year-old blog post and start throwing ancient answers at the server. More than once, a client has reached out to me while on hold with the vendor!

The Hidden Costs of Hiring a Junior DBA

Junior DBAs are generally wonderful people. I don’t bash people, pretty much ever. Almost all of them come from other parts of the overall IT ecosystem. The strongest DBA skills come from a LOT of study and a LOT of practice. You don’t get that in 2 years.

Your shiny new DBA (if you have one) might not realize the nuances of the following critical items:

Index design at the table level – too many, not enough, duplicates, etc.
Query tuning – When is a cursor fine, and when is it bad? What’s killing tempdb? What the heck is RBAR?
Inefficient server level configurations. As I typed this, I found 2 major systems at a new client with some horrible default settings that are causing them very high CPU utilization. They are paying monthly for Enterprise licenses they shouldn’t need!
Security – Poorly managed permissions = open door for ransomware you may never fully recover from.
A Junior DBA may not be test restoring your databases. You MUST know that your backups are valid if you want to meet management expectations.
I did this early in my SQL career – ASK FOR HELP sooner rather than later! Junior “anything” people will frequently try too hard to figure something out on their own before seeking assistance.

How Not Having a DBA Hurts Your Business

Longer downtimes – lets face reality – having a seasoned veteran on speed-dial or in the office means a faster response and resolution. Opening a ticket with your vendor or hunting for a consultant available “now” is difficult, at best.
Slow performance leads to lost productivity, angry employees, and customers buying elsewhere. When they STOP complaining, it has gone way too far.
Compliance failures if security and audits aren’t handled properly can be costly in fines or remediation expenses.
Expensive cloud bills due to inefficient configurations. The cloud isn’t free, or even cheap. Don’t take that nonsense from the cloud vendors. It’s far worse if you are paying for double the hardware of what you really need. Having access to DBA skills can help you right-size!

What’s the Right Balance?

A full-time mid to senior DBA makes a lot of sense when there are multiple SQL Servers (25+), various projects happening, tight RPO/RTO requirements, etc.

I ran a poll on LinkedIn asking where the line is drawn on how many SQL Servers a 5-7 year DBA can support:

With a small number of servers, a full-time Sr. DBA is probably overkill. They are going to get bored. I spent a year baby-sitting 4 servers for one of the Big 4 accounting firms. I worked 30 minutes a day and drank a lot of coffee. Until I couldn’t take it anymore and started Dallas DBAs.

With a limited footprint, you are probably better off outsourcing. You might look for a Senior DBA that freelances on the side (availability might be an issue), or a firm that offers a mix of maintenance and off-hours. Dallas DBAs is the best in the world of course (😊):

The Bottom Line: What Should You Do Next?

Get a SQL Server Health Check done. Periodically.
Look for signs of database neglect: Slow queries, long downtimes, too many people with sysadmin rights.
Don’t wait until you’re in emergency mode! Get ahead of the chaos today.

Thanks for reading!

Kevin

My Newsletter (full version)

New Pocket DBA® clients get the first month FREE!

Join my ‘Accidental SQL DBA‘ group here

Get a Free SQL Server Health Check!

SQL Server Performance – What IT Leaders Need to Know

March 13, 2025 by Kevin3NF Leave a Comment

IT leaders have a lot on their plates! Budgets, staffing, security, uptime, and keeping everything running smoothly.

When SQL Server performance becomes an issue, a common reaction is often to throw money (or Google) at the problem: more CPUs, bigger servers, or Enterprise licensing. But before you go visit the CFO, let’s review some things that really impact SQL Server performance.

More CPUs Might Not Be the Solution

Adding more CPUs sounds like an easy fix, but SQL Server performance issues are often tied to blocking, inefficient indexing, or poor query design rather than CPU bottlenecks. SQL Server licensing costs are per core, so scaling up can drain your budget without improving performance. A SQL Server health check can pinpoint the real bottlenecks before you commit to more hardware. Your DBA team (staff or outside contractors) should know how to quickly identify what is causing the issue as it happens and over time. A CPU bump in a cloud environment as a stop-gap is a valid step to consider, but let the DBA gather the SQL stats first. Restarts wipe out most of the performance meta-data.

Enterprise Licenses Might Not Be the Solution

Enterprise Edition unlocks powerful features and more CPU/memory capacity but if your server is struggling, the root cause may not be the edition. More often than not it’s the workload or configuration. Before upgrading, assess indexing strategies, query optimization, and memory settings. Standard Edition can handle a lot if configured properly, and sometimes the right tuning eliminates the need for an expensive upgrade. $7500 per Enterprise core is a costly guess.

Monitoring SQL Server Is Different from Monitoring Other Servers

Traditional server monitoring focuses on CPU, RAM, and disk usage, but SQL Server needs specialized monitoring. Wait stats, deadlocks, blocking, missing indexes, etc. are all potential indicators of SQL Server health. General-purpose server monitoring tools miss some of these details, leading to wasted troubleshooting time. SQL-specific monitoring is critical. Most non-SQL Server folks don’t realize that 90%+ memory utilization is common and expected for a SQL Server. 100% by the SQL Server process is probably a configuration item.

RPO and RTO Goals Impact Performance

Your Recovery Point Objective (RPO) and Recovery Time Objective (RTO) dictate how quickly and accurately you can restore your data. Frequent transaction log backups improve RPO but can add overhead and can reduce Virtual Log File count. A tight RTO might require high-availability solutions like Always-On Availability Groups, which introduce additional performance considerations. Performance and recoverability must be balanced, not treated as separate issues.

Virtualization Adds Complexity

SQL Server generally runs well in a virtualized environment, but it requires tuning. Overcommitted CPU resources, shared storage contention, and improper memory allocation are common performance killers. Your DBA (or consultant) should be involved in virtualization decisions to avoid costly mistakes. Does your virtualization layer offer multiple paths to the storage (virtual controllers)? Are you taking advantage of that?

Indexing Strategies Matter More Than You Think

Poor indexing is one of the biggest performance killers. Without the right indexes, SQL Server scans entire tables instead of seeking just the rows it needs—slowing everything down. Worse, adding too many indexes can cause excessive writes and degrade performance. Finding the right balance is key. Index tuning is both art and science, and should be reviewed periodically as workload changes.

Maintenance Plans Aren’t Just a Checkbox

SQL Server needs regular maintenance: index rebuilds (it depends), statistics updates (it does not depend), and proper backup strategies. Many IT teams assume that setting up a default maintenance plan is enough. It’s not. A neglected database will slow down over time, impacting performance in ways that hardware upgrades won’t fix.

The Bottom Line

SQL Server performance resolution isn’t just about throwing more resources at the problem. Tt’s about understanding how SQL Server works. If your team doesn’t have a dedicated DBA, getting an expert to assess your setup can save you a lot of time, money, and frustration.

Get a free Health Check for your most important SQL Server today from Dallas DBAs.

SQL Server Time Bombs

February 28, 2025 by Kevin3NF Leave a Comment

Common Reasons for Emergency SQL calls

If you are a production DBA (or Accidental prod DBA) you’ve gotten that frantic call in the middle of the night. Or maybe during little Suzy’s soccer game? Something broke, alerts are firing, nobody can order your widgets and the help desk tech has no idea where to start.

I’ve been on call since 2006. These are the most common things I’ve been called for at the worst times:

Tempdb growth

When tempdb grows out of control, it can easily fill a drive and stop the SQL Server in its tracks. Every database (potentially) uses the same tempdb to do its sorting, aggregating, maintenance and spilling of large transactions. That is the purpose of it.

Nothing in the system will stop a really bad query from filling up the tempdb data files to consume the entire drive. Once it fails the system should be ok/online, but you still have a full drive firing alerts at the sysadmins.

How to prevent: cap the tempdb files to maximum size that will keep alerts from firing for disk space. Also, set up alerts for errors 1105 (data file is full) and 9002 (log file is full). These go to the DBA.

You cannot stop bad queries if you allow users to run Ad Hoc, but you can limit the blast radius.

Disk Space

I don’t have enough toes to count the number of times I’ve been hit with an on-call because disk space was causing issues in SQL Server. Most of the time I search for all files, sort by size descending and look at the biggest files. VERY frequently someone dumped a backup file there, or created a new DB on the wrong drive.

How to prevent: Log the output of exec master..xp_fixeddrives and keep an eye on it. Set permissions for just admins and SQL Service accounts to limit Joe’s ability to break things.

Backup failures

Tend to these quickly, as your RPO might be in jeopardy if backups are missing. Easily 80% of the backup failures I see are related to disk space (see above). 19% are network related. 1% is everything else.

How to Prevent: see Disk Space above. For T-log backups I set the job step to retry once before failing. Weird network blips shouldn’t cause someone to wake me up.

Patching
This is probably the hardest to deal with, but it comes up often. If you are not mostly current on Cumulative Updates and Security Patches, expect this call. Especially during a monthly Windows patch cycle – which doesn’t include SQL Server patches.

How to Prevent: see Disk Space above. For T-log backups I set the job step to retry once before failing. Weird network blips shouldn’t cause someone to wake me up.

The Bottom Line:
An ounce of prevention prevents your phone from ringing. Pay close attention to the setup and maintenance of your SQL Server to stop these issues before they start!

IRL #8: SQL Server Down

September 6, 2023 by Kevin3NF Leave a Comment

Problem:

My PocketDBA® client called in with a non-responsive SQL Server. Application errors, timeout, alerts firing. SQL Server box is up, and the services are running.

Background:

This is a stand-alone bare metal server located at a hosting facility with some local storage as well as NAS storage. SQL Server 2016 Standard Edition, 8 cores, 128GB RAM. The primary database is nearing 1TB and is has multiple filegroups (Active, Archive, Indexes, etc.). This database and apps that touch it ARE the company.

Investigation:

Attempts to Login failed – SSMS, Dedicated Admin Connection (taken), SQLCMD
Errorlog accessed directly from the file system showed:
- Log Backup failures
- Multiple Slow I/O requests: “SQL Server has encountered 112 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [H:\Company_LOG\Company_log.ldf] in database id 42”
- Sp_whoisactive revealed a long running LOG backup reading from the H drive, writing to the N drive.
- Windows Resource Monitor revealed Disk H Active Time (%) at 100, Disk Queue length – 0.
During this process the database went into suspect mode as it lost its connection to the .LDF completely. Verified the file was still in the expected location.
- Cannot bring offline or online via any method (due to the above log backup)

Process:

KILL the Log Backup and disable the job. No change after 30 minutes for the KILL to complete, no additional traffic on this database.
Contact the hosting firm to open a critical support ticket to check the NAS.
Host confirmed cable settings, O/S settings, network settings and restarted the NAS.

Results:

After the NAS reboot, everything came up and appeared normal.
Brought the database online with no issues and a quick run through the Recovery process

Conclusions:

Always verify:
- Services are running
- Drives are available
- Connections can be made
- Sp_whoisactive – run it early in the process
Often times, SQL Server is a victim of something else.
NAS vs SAN…largely a budget issue but has performance implications
Make sure you have the phone numbers handy for anyone supporting your systems and an SLA in place.
High Availability is a really good idea. This was a 3 hour outage.

Other posts in the IRL – In Real Life series:
https://dallasdbas.com/category/sql/irl/