Troubleshooting

SQL Server Errorlog: Not Just for Errors!

July 2, 2025 by Kevin3NF Leave a Comment

A Goldmine of Clues You Might Be Ignoring

SQL Server’s Error Logs often get overlooked, until something breaks. But if you know how to use them, they can tell you what went wrong before you’re knee-deep in outage mode.

Whether you’re a sysadmin who inherited SQL or an MSP juggling multiple environments, understanding the ErrorLog is low effort, high return.

Let’s decode the basics.

What Is the SQL Server Error Log?

It’s not a crash dump. Its not even just errors. The errorlog is a plain text file that tracks:

Startup and shutdown events
Login successes and failures (settable)
Backup and restore activity
Deadlocks (if enabled)
Corruption errors
Memory dump ingo
Full data and log file warnings
And more!

It’s like a flight recorder for SQL Server. If something bad happens, it probably shows up here.

How Many Logs Are There? And Where Are They?

By default, SQL Server keeps:

7 error logs (Errorlog (current) through Errorlog.6)
Each gets recycled when the SQL Server service restarts or the logs are manually cycled
Located in the same folder as your SQL Server binaries (or wherever SQL was installed)
- If you can’t find them see this short video: How to find the SQL Server Errorlog
- Also, the location is the -e startup parameter in SQL Sever Configuration Manager

You can view them in SQL Server Management Studio under Management > SQL Server Logs or directly from the file system.

Changing the Retention Settings

Seven logs may not be enough if you cycle the service frequently. Here’s how to increase retention:

USE [master];
GO

EXEC xp_instance_regwrite N'HKEY_LOCAL_MACHINE', N'Software\Microsoft\MSSQLServer\MSSQLServer', N'ErrorLogSizeInKb', REG_DWORD, 0
GO

EXEC xp_instance_regwrite N'HKEY_LOCAL_MACHINE', N'Software\Microsoft\MSSQLServer\MSSQLServer', N'NumErrorLogs', REG_DWORD, 25
GO

If you don’t like registry edits, use SSMS:

Management>>SQL Server Logs>>Right-click: Configure. Check the “Limit the number of error log…” box, change the “Maximum number of error log files” to your preference and click OK

You can also cycle the log manually without restarting SQL Server:

EXEC sp_cycle_errorlog;

Do this regularly (e.g., weekly) via SQL Agent job so each log stays readable and relevant.

I like weekly cycling, 25 retention so I have 6 months of reasonably sized log files.

Finding the Signal in the Noise (Filtering)

The error log can be…wordy. Thankfully, SSMS lets you filter by multiple options:

For command-line lovers:

EXEC sp_readerrorlog 0, 1, 'Login failed';

This searches the current log for login failures.

Want to scan older logs? Change the first parameter:

EXEC sp_readerrorlog 3, 1, 'backup';  -- Searches the 4th oldest log

(xp_readerrolog exists, but is undocumented. Use sp_readerrorlog instead)

More Filtering:

If you can, log only failed Logins, not failed and successful. Your industry may require other setups.

Use trace flag 3226 to NOT log successful transaction log backups. Those are already being logged in the msdb tables and job history.

The Bottom Line: Set It and Watch It

Check retention: 7 logs may not be enough
Automate log cycling so logs don’t get so big they are unreadable
Use filters to quickly find useful info

Some monitoring tools read the logs for you, but you should still know how

Your SQL Server Is Whispering…Are You Listening?

May 7, 2025 by Kevin3NF Leave a Comment

Early Warnings of a Meltdown (and How to Catch Them)

Most SQL Server crashes don’t come out of nowhere.
They leave breadcrumbs – red flags that something’s not right. The problem? If you don’t know where to look, you miss the signs…until it’s 2am and your CEO’s calling.

Let’s talk about how to listen for those whispers before they turn into full-blown alarms.

The Canary in the Query Mine

Here are a few subtle warning signs I see often before a server falls over:

Backup jobs start taking longer for no obvious reason.
TempDB grows larger than usual
Login failures spike
High CPU or long-running queries suddenly become “normal.”
End users QUIT complaining…because they’ve gotten used to the problem.
DBCC CHECKDB gets skipped because “it blocks things”

If any of these ring a bell, your SQL Server might be trying to tell you something.

Logs You Shouldn’t Ignore

Quick reality check: Most people never check the logs until the server is on fire, or after

SQL Server Errorlog:
- Corruption messages (Severity 24 and 25).
  - Run this: Select * from msdb.dbo.suspect_pages
- Failed login attacks (possible brute force or misconfigurations).
  - Some of these might be from vulnerability testing tools and can be ignored. Might.
- IO warnings that your storage is too slow or unreliable.
- How to find the SQL Server ErrorLog on disk
Agent job failures that are just quietly failing in the background.
Windows Event logs
Cluster logs

Pro tip: Set a calendar reminder to review logs once a week. Seriously—it takes 5 minutes and could save your bacon.

DIY Monitoring: No Budget? No Excuse.

You don’t need fancy tools to catch problems early. Here are a few dead-simple tricks:

SQL Agent job that checks disk space and emails you when it drops below 10%.
Query to track the longest-running queries over time—schedule it to run daily.
Custom SQL alerts for CPU over 90% for more than X minutes.

Set Up Built-In SQL Alerts (Takes 5 Minutes)

SQL Server has a native Alerting feature. No third-party tools required.

Here’s how to set up critical alerts for things you never want to miss:

--Step 1:

EXEC msdb.dbo.sp_add_operator 
    @name = N'DBA_Alerts', 
    @enabled = 1, 
    @email_address = N'[email protected]';
-- This requires Database Mail and an SMTP server

--Step 2
-- Severity 19-25 (from “Something’s wrong” to “We’re toast”)
-- Change and run for each severity you want to track. Sev 20 may give false positives

EXEC msdb.dbo.sp_add_alert 
    @name = N'Severity 19', 
    @message_id = 0, 
    @severity = 19, 
    @notification_message = N'Severity 19 error occurred', 
    @job_id = NULL, 
    @enabled = 1, 
    @delay_between_responses = 300, 
    @include_event_description_in = 1, 
    @database_name = N'master', 
    @operator_name = N'DBA_Alerts', 
    @notification_method = 1;

Repeat this for severity levels 19 to 25, or use a script to loop through them.
Bonus: Create alerts for SQL error numbers like:

823, 824, 825 – signs of potential data corruption
9002 – log file full

The Bottom Line

SQL Server doesn’t generally go down without a fight. But it does give off signals.
Learn to listen, and you’ll dramatically reduce those “why is the website down?”.

Database Emergencies Cost Thousands Per Minute

When SQL Server fails, you need expert help fast.
For a limited time: First month of Pocket DBA® FREE with a 6-month plan.

• Senior DBAs available 24/7 • 1-hour SLA • Proactive problem prevention

Don’t wait for a disaster. Be prepared.
CLAIM FREE MONTH
Offer ends June 1st. Limited availability.

Thanks for reading!

–Kevin

IRL #8: SQL Server Down

September 6, 2023 by Kevin3NF Leave a Comment

Problem:

My PocketDBA® client called in with a non-responsive SQL Server. Application errors, timeout, alerts firing. SQL Server box is up, and the services are running.

Background:

This is a stand-alone bare metal server located at a hosting facility with some local storage as well as NAS storage. SQL Server 2016 Standard Edition, 8 cores, 128GB RAM. The primary database is nearing 1TB and is has multiple filegroups (Active, Archive, Indexes, etc.). This database and apps that touch it ARE the company.

Investigation:

Attempts to Login failed – SSMS, Dedicated Admin Connection (taken), SQLCMD
Errorlog accessed directly from the file system showed:
- Log Backup failures
- Multiple Slow I/O requests: “SQL Server has encountered 112 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [H:\Company_LOG\Company_log.ldf] in database id 42”
- Sp_whoisactive revealed a long running LOG backup reading from the H drive, writing to the N drive.
- Windows Resource Monitor revealed Disk H Active Time (%) at 100, Disk Queue length – 0.
During this process the database went into suspect mode as it lost its connection to the .LDF completely. Verified the file was still in the expected location.
- Cannot bring offline or online via any method (due to the above log backup)

Process:

KILL the Log Backup and disable the job. No change after 30 minutes for the KILL to complete, no additional traffic on this database.
Contact the hosting firm to open a critical support ticket to check the NAS.
Host confirmed cable settings, O/S settings, network settings and restarted the NAS.

Results:

After the NAS reboot, everything came up and appeared normal.
Brought the database online with no issues and a quick run through the Recovery process

Conclusions:

Always verify:
- Services are running
- Drives are available
- Connections can be made
- Sp_whoisactive – run it early in the process
Often times, SQL Server is a victim of something else.
NAS vs SAN…largely a budget issue but has performance implications
Make sure you have the phone numbers handy for anyone supporting your systems and an SLA in place.
High Availability is a really good idea. This was a 3 hour outage.

Other posts in the IRL – In Real Life series:
https://dallasdbas.com/category/sql/irl/

Virtual Log Files

November 24, 2020 by Kevin3NF Leave a Comment

Today’s post is a guest article from a friend of Dallas DBAs, writer, and fantastic DBA Jules Behrens (B|L)

One common performance issue that is not well known that should still be on your radar as a DBA is a high number of VLFs.

Virtual Log Files are the files SQL Server uses to do the actual work in a SQL log file (MyDatabase_log.LDF). It allocates new VLFs every time the log file grows. Perhaps you’ve already spotted the problem – if the log file is set to grow by a tiny increment, then if your the file ever grows very large, you may end up with thousands of tiny little VLFs, and this can slow down your performance at the database level. Think of it like a room (the log file) filled with boxes (the VLFs). If you just have a few boxes, it is more efficient to figure out where something (a piece of data in the log file) is, than if you have thousands of tiny boxes. (Analogy courtesy of @SQLDork)

It is especially evident there is an issue with VLFs when SQL Server takes a long time to recover from a restart. Other symptoms may be slowness with autogrowth, log shipping, replication, and general transactional slowness. Anything that touches the log file, in other words.

The best solution is prevention – set your log file to be big enough to handle its transaction load to begin with, and set it to have a sensible growth rate in proportion to its size, and you’ll never see this come up. But sometimes we inherit issues where best practices were not followed, and a high number of VLFs is certainly something to check when doing a health assessment on an unfamiliar environment.

Find your VLF counts

The built-in DMV sys.dm_db_log_info is specifically for finding information about the log file, and command DBCC LOGINFO (deprecated) will return a lot of useful information about VLFs as well. There is an excellent script for pulling the count of VLFs that uses DBCC LOGINFO from Kev Riley, on Microsoft Tech Net:

https://gallery.technet.microsoft.com/scriptcenter/SQL-Script-to-list-VLF-e6315249

There is also a great script by Steve Rezhener on SQLSolutionsGroup.com that utilizes the view:

https://sqlsolutionsgroup.com/capture-sql-server-vlf-information-using-a-dmv/

Either one of these will tell you what you ultimately need to know – if your VLFs are an issue. How many VLFs are too many? There isn’t an industry standard, but for the sake of a starting point, let’s say a tiny log file has 500 VLFs. That is high. A 5GB log file with 200 VLFs, on the other hand, is perfectly acceptable. You’ll likely know a VLF problem when you find it; you’ll run a count on the VLFs and it will return something atrocious like 20,000. (ed – someone at Microsoft support told me about one with 1,000,000 VLFs)

Resolution

If the database is in Simple recovery model and doesn’t see much traffic, this is easy enough to fix. Manually shrink the log file as small as it will go, verify the autogrow is appropriate, and grow it back to its normal size. If the database is in Full recovery model and is in high use, it’s a little more complex. Follow these steps (you may have to do it more than once):

Take a transaction log backup .
Issue a CHECKPOINT manually.
Check the empty space in the transaction log to make sure you have room to shrink it.
Shrink the log file as small as it will go.
Grow the file back to its normal size.
Lather, Rinse, Repeat as needed

Now check your VLF counts again, and make sure you are down to a nice low number. Done!

If you need help, contact us

Thanks for reading!

Kevin3NF

SQL Server Express Size Limit

July 2, 2020 by Kevin3NF Leave a Comment

My Pluralsight course for new SQL Server DBAs

(Needs to upgrade to a bigger bowl)

Common scenario:

Company is using SQL Server Express to save on licensing costs for a small database. Over time this database grows and is getting close to the hard 10GB limit.

I just got off the phone with a client and their client is in this situation. During the conversation, she mentioned that the DB has information for four “regions.”

They already have a purge process in place, and compression is not an Express Edition feature.

Options I gave:

Decrease the retention (her client already said no to this)
Split the data into 4 databases – 1 for each region
- Might require some re-work of applications and connection strings
Install a second Express instance and split the data between them
- Easier on their app
Convince the end customer to buy SQL Standard (1% chance this is going to happen)
Buy Standard Edition for them and eat/split the cost to keep a huge customer happy
Move to a different platform (nope…all SQL Server shop)
I saw a blog post that you can bypass 10GB by using the Master database instead
- That’s the worst idea I’ve ever read. If you do that, never call me. 🙂

If you have other legitimate things I should add to this list that Microsoft will support, please share in the comments.

Conclusion:

If your data is important, buy the proper version for your needs. Maybe that’s Standard, maybe an Azure VM and spread out the SQL licensing over many months. But don’t do crazy things to try to get around the limits.

Thanks for reading!

Kevin3NF

Follow @Dallas_DBAs | |

TIL: Get-DbaErrorLog

May 29, 2020 by SQLDork Leave a Comment

Ed: SQLDork is a Junior DBA on the team here, learning the DBATools in addition to normal DBA stuff™

Today’s dbatools command is Get-DbaErrorLog, which is very similar to a script we frequently run on client servers to check for errors that need fixing.

Get-DbaErrorLog `
    -SqlInstance PRECISION-M7520\SQL2016 `
    | Select-Object `
        SqlInstance, Source, LogDate, Text `
    | Out-Gridview

Basic command usage, using Select-Object to filter out the columns i don’t really care about, then piping the results to Out-Gridview.

Get-DbaErrorLog `
    -SqlInstance PRECISION-M7520\SQL2016 `
    -LogNumber 1, 0 `
    | Select-Object `
        SqlInstance, Source, LogDate, Text `
    | Out-Gridview

Adding the -LogNumber Parameter to check the archived errorlogs, defaults to current log only.
NOTE: If you want the results ordered by date, the values passed into -LogNumber must be in descending order.

Get-DbaErrorLog `
    -SqlInstance PRECISION-M7520\SQL2016 `
    -Source spid5s `
    | Select-Object `
        SqlInstance, Source, LogDate, Text `
    | Out-Gridview

The -Source parameter lets us filter out some stuff™. I can’t think of a better way to describe that. The documentation uses -Source Logon as an example, but my sample data didn’t have that so i used a spid.

(Ed: The Source can be a spid, “logon”, “Server”, or anything you see in the errorlog Source column in the SSMS Log Viewer)

Get-DbaErrorLog `
    -SqlInstance PRECISION-M7520\SQL2016 `
    -LogNumber 5, 4, 3, 2, 1, 0 `
    -Text "instance" `
    | Select-Object `
        SqlInstance, Source, LogDate, Text `
    | Out-Gridview

Using -Text as a rudimentary where clause, but this one only supports 1 string. We can do better.

Get-DbaErrorLog `
    -SqlInstance PRECISION-M7520\SQL2016 `
    -LogNumber 5, 4, 3, 2, 1, 0 `
    | Select-Object `
        SqlInstance, Source, LogDate, Text `
    | Where { `
                ( `
                $_.Text -Like "*CheckDB*" `
            -or $_.Text -Like "*Error*" `
            -or $_.Text -Like "*UTC*" `
                ) `
           -and ( `
                $_.Text -NotLike "*found 0 errors*" `
           -and $_.Text -NotLike "*finished without errors*" `
           -and $_.Text -NotLike "*informational message*" `
                ) `
            } `
| Out-GridView

This formatting feels very strange to me, but it does a very similar thing to that errorlog query we use that i mentioned earlier.

(Ed: The Where-Object or Where is PowerShell, not DBATools, click here)

Shoutout to Kevin (b|t) for helping me figure out the syntax on this, and also for pointing out that he changed the -SqlInstance parameter to his own machine name when he was testing this. Saved me a bunch of time troubleshooting connection errors.

Speaking of connection errors, follow me (b|t) on twitter, even though i don’t really tweet that much. Unless i retweet pictures of dogs, that’s always a possibility.