Let’s take a look at the final few steps of your initial server intake process and get your workflow up and running as you push toward performing regular server reviews. These last few steps are relatively easy and quick to set up.
Build a Backup Frequency List
Now that you’ve set up your Maintenance Job list, you should be able to see how backups are scheduled for your servers. This is the point where you begin pushing from simply gathering info on your servers towards ensuring your servers are set up correctly.
Depending on your situation and environment, you may not need to create an actual list in a document or spreadsheet. Sometimes simply having a familiarity with what databases are backed up and their frequency is enough, but it may be best to create an actual list in the beginning.
You’ll need to be able to determine what types of backups are happening—FULLs, DIFFs, LOGs—and when those backups occur. Once you’ve collated this information then you’re ready to see if the current backup scheme meets the expected Recovery Point Objective and Recovery Time Objective. If not, then getting your backups whipped into shape should be the first task you assign yourself for your new servers.
I’m personally at the point where I have enough servers spread across different environments that I need to create a new writeup of backup schedules for those environments. There are currently too many environments to keep straight in my head, which means it’s far too easy to miss whether or not a particular server group’s backups are on schedule and up-to-date.
How so? Some of the server groups I check are scheduled to create Full backups daily, while others are only creating these backups weekly. I have some server groups in which there’s a mix of scheduling, and others that have either native SQL Server backups, virtual device “snapshot” backups, or sometimes both. One client in particular has two server groups that I check regularly, one of which is using native SQL Server backups while the second group is primarily using virtual device backups.
So, if you decide to write up an actual list, what do you include? Here’s a not-so-comprehensive list of items you may want to include in your document:
- Database name
- Recovery model
- Backup types (FULL, DIFF, LOG)
- Backup frequency for each type (weekly, daily, hourly, etc.)
- Schedule/frequency of each backup type
- Backup path/device
- Backup retention period
- Backup compression yes or no
- Native SQL Server backup, virtual device backup, or other
- Maintenance Plan, Ola scripts, or other
The above is simply a sampler of items you may choose to include. Again, depending on your environment and SLAs, your list might look somewhat different.
Example Server Review Report
Now that you’ve gotten all of your server’s baselines gathered you’re ready to start implementing regular checks. How often you perform these server reviews will depend on your employer’s dictates or current SLAs. For me, I currently have several server groups I check daily, one server group that is checked twice per week, and one server group that is only checked once per week.
What’s included in my regular reports? In the coming weeks I’ll be stepping through how I get my reviews done, but for now I’m including a sample report below so you can get a sense of the sorts of items you’ll be looking over as you check your servers.
ABC Server Review 03-22-2021 Servers: Unable to connect to ABC-SQL-03. Upon ping "Ping request could not find host ABC-SQL-03.". Recommend applying CU22 for SQL Server 2017 and Security update for SQL Server 2017 CU22: January 12, 2021 [CVE-2021-1636] to ABC-SQL-01. Third Party Monitor: If there are any third party monitoring products in use—such as Solarwinds DPA or Redgate Monitor—any current alerts get notated here. Replication: ABC2020_Pub5 is reporting Critical performance. All other publications are reporting Excellent performance. Jobs: The job 'Write Stuff to the Table' failed at step 'Load Stuff' on ABC-SQL-01 on 03-21-21 at 4:00 AM. This job is scheduled to run hourly and failed previously on 03-17-21. The job has executed successfully since 03-21-21 at 5:00 AM. The job was the deadlock victim. Backups: All backups are up to date. Errorlogs: Login failure for user 'ABC\lcage' on ABC-SQL-02 on 03-21-21 at 7:36 AM. Reason: Could not find a login matching the name provided. [CLIENT: 184.108.40.206] 66 deadlocks on ABC-SQL-02 on 03-21-21 through 03-22-21. Alerts: No alerts to report. AG Health All Availability Groups reporting Healthy. Other:
Your own reports may look significantly different than what you see above. Obviously, if your servers are not using features such as replication or Availability Groups then there’s no need to include them in your report. Depending on your employer’s or client’s expectations—and your SLAs—your report may be more or less detailed. You may even have some sort of ticketing system in place.
In my case, I send my reports to my senior, Kevin, as an attachment to an email for his review. I’ll note any items I believe are critical and need immediate attention in the body of the email and highlight them in red text. There are certain issues which are appropriate for someone at my level that I will rectify on my own, otherwise Kevin and I will address issues together, or he will contact the necessary personnel who oversee particular aspects of the servers as needed. This is the workflow that works for us, but your practices and procedures may look different.
At this point, you may need to go even deeper and to build a full server inventory which is more comprehensive than what I’ve laid out in the last few posts.
This may also be the point where you need to do a full Health Check on your servers and look for any best practices that need to be implemented and note any problems you find that need to be addressed. Create a list of these items and your suggested recommendations for remediation, as well as a potential schedule of when and how it might be best to address these issues.
Stay tuned! In the next group of posts we’ll start an overview of implementing server monitoring and active reviews.