• Skip to main content
  • Skip to primary sidebar

DallasDBAs.com

SQL Server Database Consulting

  • Services
    • SQL Server Health Check
    • Fractional DBA
  • Pocket DBA®
  • Blog
    • Speaking
  • Testimonials
  • Contact
  • About

Troubleshooting

The Ironic DBA—Don’t Be Afraid of the Climb

October 14, 2019 by SQLandMTB Leave a Comment

Welcome back to The Ironic DBA Files, a series where the newbie DBA on staff at Dallas DBAs chronicles the ups and downs of fast-tracking a new career as a DBA.

Yes, my blogging pace has slowed somewhat, but I’m back with something to share. Last time I wrote about getting back to basics and reinforcing my knowledge of introductory SQL Server concepts. This week I want to go in the opposite direction and share why I think you should regularly push yourself out of your comfort zone.

The view from the top is worth the climb.

Climbing Sucks

I’ve mentioned before that I love mountain biking—I’m not great at it, but I love getting out on the trails. Unfortunately, I also have very low heat tolerance, and so I was almost completely off my bike for the majority of the Texas summer—about 2.5 months. I did sneak one ride in before the worst heat abated about 3.5 weeks ago, but it didn’t end well.

I’ve been able to get back out and do three or four trail rides a week for the last three weeks. The trail I’ve frequented is one of the closest to my house, though it’s not a favorite. There’s not a lot of elevation at this trail, but after so long off the bike even gentle slopes were tough to climb at first. To make matters worse, I’m overweight, have some significant balance issues, and absolutely suck at climbing.

Fast forward to just last night where I went out and rode a different trail—the same one that hadn’t gone well 3.5 weeks ago. This trail is not the longest or hardest in the area, nor does it have the most climbing, but it’s been my nemesis since the first time I put down tires on its dirt. I have a less than 50% completion record at this particular trail, so I’m determined to conquer it.

Last night’s ride was pretty good. I completed the entire trail for the first time in months, and set some personal records on a few segments along the way. But here’s the coolest thing about the ride. At one point I made it to the top of climb that’s typically pretty tough for me and took a quick water break. I immediately noticed that the climb had felt easier than expected, but I figured that was due to intentionally choosing easier gears than usual. I was pleasantly surprised to look down at my gears and realize that I was 9th gear when I thought I was probably in 6th or 7th. (For those who don’t ride bikes, the lower the gear number the “easier” the gear.)

The time I’ve been spending getting back to trail riding regularly is already beginning to pay off. I’m back to feeling comfortable on the bike, my fitness level is increasing again, and my ability to ride harder, longer, steeper trails is gradually improving.

Climbing is Awesome

Tim Chapman presenting “Troubleshoot SQL Server Like a Microsoft Engineer”

“That’s great,” you say. “Good for you. But what does this have to do with SQL Server stuff?” I’m glad you asked!

I recently attended a fantastic one-day workshop that was made possible by my local PASS group, the North Texas SQL Server User Group. After the huge success of this year’s SQL Saturday Dallas, and in particular the response attendees had from Brent Ozar’s (b|t) performance tuning pre-con, the group decided to bring in Tim Chapman (t) in for a one-day workshop entitled “Troubleshoot SQL Server Like a Microsoft Engineer.”

I was looking forward to attending even though I knew the vast majority of the class was probably going to be over my head. I’ve advanced fairly rapidly in my SQL Server knowledge since beginning this journey a little over four months ago, but the reality is I’ve only got four months of experience. That’s actually far too little time and experience to make the most of a class at this level. So why bother attending?

Learning is a climb, sometimes a very arduous climb. Just like some of my trail rides, there are moments when I can’t reach my learning goals and have to take a step back and start over. Some days the learning is smooth and the climbing is easy, boosting my confidence and my progress. Some days, the climb is hard, but not as hard as expected due to the work I’ve put in previously.

It’s for this very reason that I wanted to attend this workshop, and the same reason I’m currently working my way through Brent Ozar’s Senior DBA Class. Both are way over my head and deal with issues I won’t have to worry about being directly responsible for any time soon. But the classes make me climb and reach for a goal. Such classes make me dig deep and recall everything I’ve learned so far and apply it as best I can. Such classes expose holes in my current knowledge set and help flesh out a trail map for what I need to learn moving forward to become the best DBA I can be.

Sometimes the climb sucks, but getting to the top is awesome. Onward to the next peak!

Troubleshoot SQL Server Like a Microsoft Engineer—A Review

As for Tim Chapman’s class itself, let me give you a quick review and share my thoughts and takeaways from the day.

First up, I didn’t know what to expect walking into the class. Still being somewhat new to all this, I sort of expected the class to rapidly accelerate to light speed and beyond to leave me grasping for whatever small tidbits I could glean. That never happened.

Tim is an excellent presenter, but beyond that he’s an excellent teacher. So many technical people who are very smart, know their stuff, and have years of experience often don’t realize just how far beyond the average newbie they really are. But not once did I feel like the class was moving too fast for me to keep up, and a great deal of this is due to how Tim broke everything down logically, built concepts upon one another, and didn’t make assumptions that everyone in the room knew more than they really did.

Having said that, there definitely was a certain level of foundational knowledge required in order to make the most of the class, and thankfully my studies thus far proved to be enough. There were a few concepts and pieces in SQL Server that I’d never heard of before (namely triggers and forced query plans), but by and large I was familiar enough with SQL Server, relational databases, nomenclature, and SSMS to never get completely lost.

In a nutshell, here’s what Tim covered in roughly seven-hours-worth of sessions:

  • An Overview of Troubleshooting
  • Queries and Waits
  • Extended Events
  • Query Store
  • Most Common Problems

One of the most encouraging things Tim shared as a new DBA was this concept: “Often, there are 5 or fewer things to look at for a server to solve problems. Like the Pareto Principal: 20% causes 80% of the issues. Today’s Goal: Identify and master the 20%.”

This is why so much of the class was understandable and relate-able. The truth is, to know how you fix the other 80% of problems possible in SQL Server means you’ve either gained years and years of experience, know how to ask Google and StackOverflow the right questions, or both. Stay in the DBA field long enough and you’re sure to run into something rare, an edge-case that very few people have seen. But beyond those edge cases, the vast majority of the problems we see are repeated regularly. Getting a grasp on these underlying issues that regularly rear their head will help me be a solid DBA while giving me the foundation to troubleshoot those rare issues.

Big thanks to Tim for helping me put together a more direct path to follow on my self-guided learning. If any of you reading this ever have the chance to take this class from Tim in the future, I highly recommend it. I’d like to sit through it again myself sometime in the future when I’ve got more knowledge and experience in my tool kit.

Also thanks to NTSSUG for investing in your members and bringing Tim in to teach us. I’m looking forward to more opportunities in the future.

That’s all for this post. Join me next time for the next episode in The Ironic DBA Files.

Follow me on Twitter at @SQLandMTB, and if you’re into mountain bikes come over and check out my site NTX Trails.

The Ironic DBA Files

    • Prequel: The Ironic DBA—Starting a New and Unexpected Career
    • Episode 1: You Back That Up?
    • Episode 2: Attack of the Corruption
    • Episode 3: Revenge of the Index
    • Episode 4: A New Primary Key
    • Episode 5: The Maintenance Plan Strikes Back
    • Episode 6: Return of the TSQL
    • Episode 7: The Backup Awakens
    • Episode 8: The Last Rebuild
    • Episode 9: Rise of the Clients
    • Review One: A SQL Story
    • It’s Hip to Be Square
    • Rock Around the Clock
    • Failure is Always an Option
    • Back to Basics

Follow @Dallas_DBAs

Filed Under: Accidental DBA, Apprentice, Career, Cycling, EntryLevel, Personal, Troubleshooting

IRL #5: Massive T-Log File

August 12, 2019 by Kevin3NF Leave a Comment

Problem:

I had a customer call in this week after struggling to resolve an issue for a couple of days,  Their Transaction Log file had grown to 400+GB and was filling the drive to the point they had to keep adding space to the drive to keep operations online.  This was for a core internal system.  There are no staff DBAs here, but plenty of very sharp sysadmins, developers, network folks, etc.

Note: I have written about the most common cause of this issue here, but this was not the resolution in this case. The embedded video is far and away number one in hits of all my SQL videos, showing just how common log file growth issues are.

Background:

  • Current SQL Server version
  • Data file around 90GB
  • Log backups enabled, occurring and verified (250MB or so every 30 minutes)
  • Replication not in play.

Investigation:

  1. Run DBCC OpenTran to see if some process was left hanging
  2. Check master.sys.databases.Log_reuse_wait_Desc for the database in question. See the “Wrapping Up” section for a proper query.
  3. Review Disk Usage report for the database – verify if the log file is full or just big and empty

Findings:

  1. Found an open transaction, sitting there since 2.5 days ago.
  2. sys.databases showed ‘Active_Transaction’ for our database, confirming Finding 1. Other possible values for log_reuse_wait_desc here
  3. Disk usage report showed the T-log was 99% used

Actions Taken:

  1. Received permission from application team to kill the SPID identified in DBCC Opentran
  2. Sys.Databases description changed to Log_Backup as expected
  3. Disk usage report now shows 99% unused.
  4. Fairly lengthy cycle of running Checkpoint, Log Backups and Shrink File attempts got us back to a 2GB .LDF file. The log file Autogrow was 10MB….we had 32K+ VLFs once the root problem was resolved by killing the spid
  5. Changed the Autogrow to 512MB in case this happens again to reduce the VLF count

Results:

  • Application performance immediately improved
  • Reclaimed 400GB of disk space
  • Educated my client on the basics of log file maintenance, how they are written to, etc.
  • Everyone slept well that night – no on call wakeups
  • Total time: 1 hour

Conclusion:

  • Most of the time this issue comes from a lack of log backups…but not always.  Be ready with the tools for deeper research
  • I was fully prepared to have them install sp_whoisactive if necessary for deeper digging
  • It pays to have a DBA in your Pocket if you are not going to (or don’t need to) staff one

Other posts in the IRL – In Real Life series:
https://dallasdbas.com/category/sql/irl/

Thanks for reading!

Kevin3NF

Follow @Dallas_DBAs

Filed Under: IRL, Troubleshooting

Announcement: Pocket DBA™ service

February 5, 2019 by Kevin3NF Leave a Comment

Several months ago, we posted a blog about keeping a Database Administrator on retainer for emergencies, called “Put a DBA in Your Pocket”

We cross-posted a version over to LinkedIn at the same time.

The response to both was overwhelmingly positive!

So much so that we are bringing this idea on as a new service offering, effective today!

This service comes with different levels for different needs and budgets.  It is primarily intended for you to be able to get a quick response from someone you trust (that’s us…) when bad things happen to your database server, which they eventually will.  Its the nature of tech.

In addition to being there for emergencies, you also get some periodic proactive benefits as well.

All of the pricing, benefits and terms are on the Pocket DBA™ page, separate from our current Services page.

Launch Special:  All new customers that sign up in February qualify for special rates and additional benefits… This is a limited-time offer, please message me directly for more details.

Contact us today with any questions, or to grab one of those 3 early-adopter spots!

This service is only available to US businesses at this time.

Filed Under: Dallas DBAs, Emergency, SQL, Troubleshooting

IRL #2: Cannot Connect to SQL Server

January 23, 2019 by Kevin3NF Leave a Comment

IRL #2 – Connectivity Issues

Problem: Application owner is trying connect to SQL Server to create a database, connection failed.  Error message suggests checking Instance Name and “Allow Remote Connections” option set to True.

Background: This client has a wide variety of SQL Servers in Prod, Test and Dev.  Many different versions and editions, quite a few coming from 3rd party vendors.  This was in a new test environment for an application already up and running in Prod.

Resolution Attempts:

  • Try to Connect via IP address – failed
  • Verify Instance name – verified
  • Verify “Allow Remote Connections” is set to 1/True – it is
  • Verify SQL Browser Service Running – discovered Set to Disabled
  • Verify TCP/IP network protocol enabled – discovered disabled

Solution:

  • Enable TCP/IP for Named Instance SQLEXPRESS03
  • Restart Instance
  • Set SQL Browser Service to Automatic and start
  • Test logon via UDL file – success

Lessons you can benefit from:

  • SQL Express default install only enables Shared memory connections
  • Error messages are not always going to give you complete information
  • Have a process for testing

This entire process took about 10 minutes, including the instance restart.  There are 3 other Express edition installs on this server.

 

Other posts in the IRL – In Real Life series:

https://dallasdbas.com/category/sql/irl/

Thanks for reading!

Kevin3NF

Follow @Dallas_DBAs

Filed Under: Configuration, Install, IRL, Troubleshooting

Video: Getting Started with SQL Server Execution Plans

January 17, 2019 by Kevin3NF Leave a Comment

I put this video together in about ten minutes of T-SQL code prep, and one-take of the video.   It is not meant to be an all-inclusive lesson…merely an introduction to how to read SQL Server Execution Plans (Right to Left, Top to Bottom, etc.) and some basic things to look at.  Senior performance tuning DBAs…you probably don’t want to watch this 🙂

Comments and likes are appreciated if you think it has value…

Thanks for watching!

Kevin3NF

Follow @Dallas_DBAs

Filed Under: Accidental DBA, Beginner, EntryLevel, Performance, Troubleshooting

IRL #1: Non-yielding IOCP Listener

January 7, 2019 by Kevin3NF 4 Comments

IRL – internet lingo for “In Real Life”

This is first in what I hope will be a regular series of posts that highlight a problem my team encountered on a SQL Server, what the cause was, and how we fixed it.

We will of course anonymize the details.  Part of this is to tell interesting stories, but a bigger part is to help future Googlers find how we resolved it.

IRL #1 – the Non-Yielding IOCP Listener

Problem: SQL Server running a major ERP system suddenly becomes non-responsive, effectively shutting down an entire retail company.  Online sales, Distribution, Customer Service…all of it.  2 weeks before Christmas.

Background:

  • SQL 2012, SP3 – Enterprise
  • 2-node Failover Cluster Instance
  • Replication Publisher
  • Other databases on this instance as well
  • Beefy hardware
  • No issues like this in at least a year that I have known them
  • No changes to the SQL code, SQL Instance or any application deployments

Investigation:

ERRORLOG (always go to the SQL Errorlog early in the process):

IO Completion Listener (0x1d8c) Worker 0x00000000056AC160 appears to be non-yielding on Node 1. Approx CPU Used: kernel 0 ms, user 15000 ms, Interval: 15002.
Using 'dbghelp.dll' version '4.0.5'
**Dump thread - spid = 0, EC = 0x0000000000000000
***Stack Dump being sent to F:\MSSQL11.MSSQLSERVER\MSSQL\LOG\SQLDump0001.txt
* *******************************************************************************
*
* BEGIN STACK DUMP:
*   12/14/18 12:43:38 spid 7328
*
* Non-yielding IOCP Listener
*
* *******************************************************************************
* -------------------------------------------------------------------------------
* Short Stack Dump
Stack Signature for the dump is 0x000000000000027C
External dump process return code 0x20000001.
External dump process returned no errors.

IO Completion Listener (0x335c) Worker 0x00000000036AC160 appears to be non-yielding on Node 0. Approx CPU Used: kernel 0 ms, user 15000 ms, Interval: 15002.
IO Completion Listener (0x1d8c) Worker 0x00000000056AC160 appears to be non-yielding on Node 1. Approx CPU Used: kernel 0 ms, user 23890 ms, Interval: 25005.

***Unable to get thread context for spid 0
* *******************************************************************************
*
* BEGIN STACK DUMP:
*   12/14/18 12:44:33 spid 13940
*
* Non-yielding Scheduler
*
* *******************************************************************************
Stack Signature for the dump is 0x000000000000027C
External dump process return code 0x20000001.

Note there are 2 different Non-yielding messages – Schedulers and IOCP Listener.

Also, stack dumps are being generated.  I do not have the tools or skills to analyze a stack dump.  I rely heavily on Microsoft for this when needed.

The stack dump .txt and .log files contained nothing other than what was in the ERRORLOG.

Resolution Attempts:

  • Failover to the B node.  Because its a cluster, and this might be a one-time glitch in the Matrix, right?
    • Post-failover the system was up for 30 minutes as the Application servers were restarted one by one.
    • When app server “4” was started we almost immediately got a repeat of the above behavior.
  • Try again, with fewer app servers – nope.
  • I suggested sending the stack dump to MS early on…but we kept trying failovers and restarts, thinking the App server load might be the issue.
  • We upgraded to the current SP, as this error in SQL 2008 was resolved in service packs, and in CUs for 2016/2017.  This did not change anything.
  • Lather, rinse, repeat in various ways for <redacted> hours until we decided to get Microsoft Involved.  We opened a Premier CritSit case, which triggers a LOT of attention and internal alarm bells on the MS side.  This has not changed since my last contract there in 2006.
  • Inside of 2 hours, MS had analyzed the dump and sent back this:
    • NonYielding Call Stack
      # Call Site
      00 <non-SQL Server .dll>
      01 0x0
      02 0x0
    • “We see the third party DLL located in the SQL Server address space: C:\Windows\System32\<non-SQL Server>.dll.”
    • A really good article that does reference the situation
    • “We see <non-SQL Server>.dll being called and it is interfering with the IOCP listener which is the thread that accepts incoming connections/TDS packets therefore preventing additional connections (the behavior we see when we cannot connect via SSMS, for example). We see the thread stalled for 15 seconds and 15 threads in SQL that have that on top of their stack. What is the <non-SQL Server>.dll?  It is calling a jmp detour in ntdll.dll.”

Solution:

  • Identified the .dll and verified with the vendor what it was doing
  • Uninstalled the application that put it there (both nodes)
  • Restarted everything with no further issues.

Lessons you can benefit from:

  • Stay current on Service Packs – many of the issues are fixed in SPs and CUs
  • If you get recurring stack dumps, call MS sooner rather than later…even on a credit card, $500 is cheaper than Prod being down
  • Karthik PK from MS has some good posts here and here on this

I am specifically leaving out the name of the offending .dll for two reasons:

  • Client anonymity
  • We don’t shame vendors around here…was probably a small bug in a product update

If you enjoyed this post and want to see more like it, please leave a comment, or ping me on Twitter

Other posts in the IRL – In Real Life series:

https://dallasdbas.com/category/sql/irl/

Thanks for reading!

Kevin3NF

Follow @Dallas_DBAs

Filed Under: IRL, SQL, Troubleshooting

  • « Go to Previous Page
  • Go to page 1
  • Go to page 2

Primary Sidebar

Search

Sign up for blogs, DBA availability and more!

Home Blog About Privacy Policy
  • Home-draft
  • Blog
  • About Us

Copyright © 2025 · WordPress · Log in

 

Loading Comments...