Problem:
I had a customer call in this week after struggling to resolve an issue for a couple of days, Their Transaction Log file had grown to 400+GB and was filling the drive to the point they had to keep adding space to the drive to keep operations online. This was for a core internal system. There are no staff DBAs here, but plenty of very sharp sysadmins, developers, network folks, etc.
Note: I have written about the most common cause of this issue here, but this was not the resolution in this case. The embedded video is far and away number one in hits of all my SQL videos, showing just how common log file growth issues are.
Background:
- Current SQL Server version
- Data file around 90GB
- Log backups enabled, occurring and verified (250MB or so every 30 minutes)
- Replication not in play.
Investigation:
- Run DBCC OpenTran to see if some process was left hanging
- Check master.sys.databases.Log_reuse_wait_Desc for the database in question. See the “Wrapping Up” section for a proper query.
- Review Disk Usage report for the database – verify if the log file is full or just big and empty
Findings:
- Found an open transaction, sitting there since 2.5 days ago.
- sys.databases showed ‘Active_Transaction’ for our database, confirming Finding 1. Other possible values for log_reuse_wait_desc here
- Disk usage report showed the T-log was 99% used
Actions Taken:
- Received permission from application team to kill the SPID identified in DBCC Opentran
- Sys.Databases description changed to Log_Backup as expected
- Disk usage report now shows 99% unused.
- Fairly lengthy cycle of running Checkpoint, Log Backups and Shrink File attempts got us back to a 2GB .LDF file. The log file Autogrow was 10MB….we had 32K+ VLFs once the root problem was resolved by killing the spid
- Changed the Autogrow to 512MB in case this happens again to reduce the VLF count
Results:
- Application performance immediately improved
- Reclaimed 400GB of disk space
- Educated my client on the basics of log file maintenance, how they are written to, etc.
- Everyone slept well that night – no on call wakeups
- Total time: 1 hour
Conclusion:
- Most of the time this issue comes from a lack of log backups…but not always. Be ready with the tools for deeper research
- I was fully prepared to have them install sp_whoisactive if necessary for deeper digging
- It pays to have a DBA in your Pocket if you are not going to (or don’t need to) staff one
Other posts in the IRL – In Real Life series:
https://dallasdbas.com/category/sql/irl/
Thanks for reading!