Apprentice

The Ironic DBA—Rock Around the Clock

August 27, 2019 by SQLandMTB Leave a Comment

Welcome back to The Ironic DBA Files, a series where the newbie DBA on staff at Dallas DBAs chronicles the ups and downs of fast-tracking a new career as a DBA.

Last week I shared why you shouldn’t completely hate [square brackets], and this week I’m going to build on that theme a little bit more by showing you some minor tweaks to some scripts we use here at Dallas DBAs on a daily basis.

I Love it When Something Unplanned Comes Together

If you’ve been following my weekly Ironic DBA posts, you know that I’m new to this gig and have been learning things as rapidly as I can. It’s a little like being thrown into a pool to learn how to swim—though not the deep end. I’ve chronicled what I’ve learned each week and attempted to share it with the world. A funny thing about this week’s post requires a little backstory.

Kevin and I have known each other for years, and our families have gotten together weekly to play games for the last several years. Last time we were sitting around the table together, we were talking about my posts and I jokingly said something like, “If I don’t have a topic for a blog post in any given week, you need to ask me what I’m doing with my time.”

Then I proceeded to struggle to come up with a topic for this week. Oh, the irony.

Archimedes takes a bath and learns a thing.

What’s really cool is that I obviously did come up with something…unless the rest of this post is simply a ramble. I’ve long believed (as a former school teacher) that two of the best methods for learning are immersion and repetition. That’s been my approach to any of my self-guided SQL Server studies, and it’s paid off so far. Earlier this week I had an Archimedes-type Eureka! moment when some various threads I’ve been pulling all came together.

One of the stepping stones I’ve been using in my studies is Kevin’s post “Top 10 SQL Server functions every DBA should know by heart.” I’ve revisited that post several times but don’t always have many relevant opportunities to put those functions into practice on my VM. Either way, I’ve been using the repetition method to remind myself that these functions exist. The relevant function for today is Getdate().

Also relevant this week is this excellent post from Ken Fisher (b|t) about the built-in agent_datetime() function in SQL Server. I first read about it because Kevin found it and tweeted about it. We ended up using it in the script edits you’ll see below.

Generic SQL Server Output Sucks

As an “artsy” type person, some of the ways SQL Server displays information pains me. I get that we’re working with data and it doesn’t always have to be beautiful, but can we at least get something a bit more reader-friendly? The answer is usually, “Yes, but you’ll have to work for it.”

Now that I’ve been doing daily server reviews in Production for a while, I’ve gotten pretty familiar with three scripts that I run against servers every day. Those three scripts are Job History, Read Errorlogs, and Last Backups (generic titles). Let’s look at Job History first since it has some of the most interesting edits applied. Here’s the original script:

Select j.name, jh.step_name, run_status, run_date, run_time, run_duration, [server], [message]
From 
	[msdb].[dbo].[sysjobhistory] jh
	join [msdb].[dbo].sysjobs j 
		on jh.job_id = j.job_id
Where 1=1
and run_status not in (1,2,4)
and run_date > 20190701 
and [step_name] <> '(Job outcome)'
Order by run_date  desc, run_time desc

--Select MIN(run_date) from msdb..sysjobhistory

Which renders the following output:

Meh, the run_date and run_time output is underwhelming and hard to read quickly. Imagine checking 100 servers or more and needing to quickly read the time/date stamps.

As I was working on my own edits, Kevin sent me a snippet of code changes—based on Ken Fisher’s blog post mentioned above—in order to get a much nicer output:

Select 
	j.name as [Job Name], 
	jh.step_name as [Job Step Name], 
	run_status as [Run Status], 
	msdb.dbo.agent_datetime(run_date,run_time) as [Job Run Time], 
	[message] as [Message]
From [msdb].[dbo].[sysjobhistory] jh
	join [msdb].[dbo].sysjobs j 
		on jh.job_id = j.job_id
Where 1=1 
	and run_status not in (1,2,4)
	and run_date >= 20190701
	and jh.step_name <> '(Job outcome)'
Order by 
	j.name,
	msdb.dbo.agent_datetime(run_date,run_time) Desc

Which gives us the following output:

You’ll also notice that I added some more of my own square bracket magic to make the column headers more readable. It’s the little details that sometimes make a big difference.

I did the same sort of thing to our Read Errorlog script, but added my own line to change the datetime stamp here as well. The original script had this line of code:

And Logdate > getdate() -3

Which rendered this result:

With Kevin’s guidance, I changed the line using the Convert() function and received the subsequent output:

convert(nvarchar(30), getdate() -3, 20) as [Error Date & Time], --convert datetime to readable format

Finally, we check for the latest backups each morning by using a script that…you guessed it…checks for the latest backups. The procedure goes something like this:

1. Run Last Backups script and get results (see screenshot).

2. Copy results with headers and paste into an Excel spreadsheet.
3. Sort and filter results in spreadsheet to make reading of latest backup timestamps easier on the eyes and fall in sequential order.
4. Report findings to client.

That’s all well and good, but since I studied the GROUP BY and ORDER BY commands in TSQL recently I thought, “Why are we going through the extra copy/paste spreadsheet sort/filter steps? Why not simply rewrite the script to sort the results for us?”

So that’s what I did. I simply edited the last line of the script (as well as do more square bracket magic):

Order by RecoveryMode, [Status], [LastFullDate], [LastLogDate], db.[Database]

And here’s what the output looks like now:

Nice! Sorted and filtered, reader-friendly last backups results.

Now, in the end, does any of this improve our client’s server efficiency? No, but what it does do is let ME be more efficient for our clients. There’s no reason to spend extra time unnecessarily.

Have I written super-complex code? Nope. Have I contributed something to the SQL community that a Senior DBA couldn’t have written in 2 minutes? Nope, but that’s not the point. The point is I learned from the experience of editing existing scripts and now have some slightly sharper tools in my toolbox. 🙂

That’s all for this week. Join me next time for the next episode in The Ironic DBA Files.

Follow me on Twitter at @SQLandMTB, and if you’re into mountain bikes come over and check out my site NTX Trails.

The Ironic DBA Files

Follow @Dallas_DBAs

The Ironic DBA—Review One: A SQL Story

August 13, 2019 by SQLandMTB Leave a Comment

Welcome back to The Ironic DBA Files, a series where the newbie DBA on staff at Dallas DBAs chronicles the ups and downs of fast-tracking a new career as a DBA.

Last time around I related the exciting development of beginning to touch production servers. Now, with ten episodes in the can I think it’s time for a review of what I’ve learned and done so far. Stick with me as I hope to relate a few things I haven’t shared before, as well as some of my favorite tips, links, and resources that have helped me progresses as rapidly as reasonably possible.

What my brain looks like after a few months of DBA training.

A New Trail

I originally related how training to become a DBA was a new and unexpected opportunity, thus my reason for choosing the moniker “The Ironic DBA.” I’m not quite sure at this point whether I’m truly ironic, or using the word incorrectly like Alanis Morissette, but I digress…

Having a Senior DBA who can mentor you is absolutely the best approach to this career. I’m sure there are many intelligent people who could self-study and figure out all of this on their own via trial and error, but having an experienced DBA looking over your shoulder is one of the ways to progress much more quickly.

Not everyone can be as fortunate as I am to have a close friend be your personal coach, but that doesn’t mean you shouldn’t look for someone who can help you regularly, whether at your workplace or a local PASS meeting.

Mapping the Loops

In mountain biking, we most often have trails that are full loops—loops that when followed from beginning to end bring you back to the trail head where you originally started. Some trail systems are stacked loop systems, where successive loops are “stacked” upon each other, connecting in such a way as to give users options for extended distance or varied routes. Trail systems are often constructed this way so that one or two loops can be opened to the public while successive loops are constructed and added over time. It’s also common that the successive loops increase in difficulty as you progress through the loops.

That’s exactly the approach we’ve taken with my DBA training. To start, you’ve got to get a handle on the basics like nomenclature, systems, and base-level architecture. What is a database? What does relational database mean? What’s an instance? Before I ever seriously considered becoming a DBA, I had already attended Kevin’s Free SQL Server DBA Training twice because I’m his friend and designer. He asked me to help him evaluate his teaching and presentation. This means I was pretty familiar with basic concepts before we started any sort of formal training.

From there, you can begin mapping out your loops in order of priority and difficulty. There are many different ways to map out these paths—another reason why a good mentor will be a great trail guide. If for some reason, you’re going it alone, I hope this post (and my series in general) will help you. I would also highly recommend these posts here on Dallas DBAs: Top 10 SQL Server functions every DBA should know by heart, and Dear Junior DBA…

For the record, and in an effort to not belabor my previous posts, here’s a breakdown of the “loops” I’ve traveled so far:

But Wait…There’s More

As you might imagine, my weekly posts have only hit the high points and most critical information I’ve been gleaning. I’ve learned quite a few things along the way I haven’t written about previously, as well as collected several resources. Below is a sort of stream-of-consciousness recollection of other things I’ve learned that I think will help other Apprentice and Accidental DBAs kick-start their career. Excuse me while I scroll back in time in my Dallas DBAs Slack channels…

Since you’re going to be working on the files and file systems within SQL Server, get familiar with using the .mdf, .ndf, and .ldf standards. One of my first questions about these filetypes—because you can actually use whatever suffix you’d like (but shouldn’t)—was whether all three were identical filetypes. Kevin’s quick answer was, “No. There are (minimally) two file types… ‘Row Data’ (.MDF, .NDF) and ‘Log’…different internal structure that SQL writes to differently.”

By the way, this and other initial questions specifically came out of watching Kevin’s Pluralsight course, Getting Started with Your First SQL Server Instance. Use this as a foundation to get familiar with installing, updating, and uninstalling instances properly. And then get familiar with how to find and install test databases such as AdventureWorks, WideWorldImporters, and StackOverflow—using a test environment Azure VM if possible.

Oh, and get ready for a huge chunk of acronyms that look almost identical that you need to learn and know the differences: SSMS, SSIS, SSAS, SSRS…

I think I’ve mentioned it before, but staying on top of applying the latest Service Pack and Cumulative Updates to SQL Server instances is very important. A fantastic resource for quickly and easily finding the latest releases from Microsoft is the Microsoft SQL Server Versions List blog. You can drill down and find the correct version and download the appropriate SPs and CUs.

Being a visual person, I gravitated toward viewing Execution Plans early on. Of course, I didn’t understand a whole lot about what I was looking at, but it was good to begin getting familiar with the icons and arrows and such. Don’t sweat it, you’ll learn more and more about Execution Plans and how to read them as you move forward. Hint: Make sure you learn the difference between Estimated and Actual Execution Plans, and how to get those specific plan results.

A sneak peek at my Google Docs SQL Server training library.

I highly suggest building a Google Docs or Office 365 online library of notes and documentation on your journey.

I’m running out of space here, and haven’t covered anywhere near as much as I’d like, so I’m going to leave you with a couple of different lists I hope you’ll find helpful.

DBA Scripts and Tools

Great Links for a New DBA

(I’m currently working my way through Stairway to T-SQL DML.)

That’s all for this week. Join me next time for next episode in The Ironic DBA Files.

Follow me on Twitter at @SQLandMTB, and if you’re into mountain bikes come over and check out my site NTX Trails.

The Ironic DBA Files

Follow @Dallas_DBAs

The Ironic DBA Files—Episode 6: Return of the TSQL

July 16, 2019 by SQLandMTB Leave a Comment

Welcome back to The Ironic DBA Files, a series where the newbie DBA on staff at Dallas DBAs chronicles the ups and downs of fast-tracking a new career as a DBA.

Last week I revisited the need to have a good understanding of Maintenance Plans, even if you end up rarely using them later on in your DBA career. As with all the other topics I’ve touched on so far, having a good understanding of the fundamentals will give you a solid foundation to start building your skills and knowledge upon. This week, let’s talk about code, specifically T-SQL.

Let Me Tell You What I Really Think

Honesty time, here—not that I don’t strive to be honest all the time anyway—I hate coding. I know how to code various languages at differing levels of proficiency, but I’ve never really enjoyed the process of learning a new language and parsing it. This is the number one reason why I avoided looking into SQL DBA as a career when Kevin first hinted at the possibilities a couple of years ago. The thought of sitting in front of a computer and banging out code for most of the day was a non-starter for me.

Thankfully, I learned that becoming a DBA involves some code wrangling, but not anywhere near as much as I expected. In fact, the early days of my training had me pretty much avoiding TSQL as much as practical so that I could concentrate on all the things I’ve previously written about.

However, there came a point where there was no option but to begin learning the basics of TSQL so I could better understand what was going on under the hood of SQL Server. Plus, though there are often multiple ways to get something done in the SSMS GUI, there are times when writing a quick script or copy/pasting a script is faster, more flexible, and more reliable for a particular database’s needs.

I’m not here this week to tell you I’m now a TSQL guru by any means. I’ve still got a long way to go, though I’m fairly comfortable writing very basic scripts and parsing slightly more complex ones. But I still hit a wall pretty quickly in understanding what’s going on within even a slightly long or complex script. That’s ok, I’ll get there.

A Few Tips I’ve Learned

First up, if you aren’t blessed to have a mentor guiding you along the path, do everything you can to find one. I can’t tell you how many times I would have been completely lost without Kevin there to fill in the gaps.
Second, don’t be afraid to read stuff that’s over your head. Just don’t spend too much time trying to figure out what it means if you’re not grasping it right away. Case in point: I often take the time to Google stuff and land on either SQLSkills.com, SQLAuthority.com, or BrentOzar.com. Those three sites have become my go-to sources when I want to figure out how to fix something or better understand how SQL Server works.
The difficulty is that these sites often go beyond my current levels of understanding fairly rapidly, especially when they start using script examples. That’s fine. Getting exposed to TSQL and reading explanations about what the scripts accomplish help me gain familiarity.
Third, don’t expect to fully understand a script unless you understand what it’s supposed to accomplish. This sounds similar and somewhat backward from my last point, but often I run into difficulty understanding a script because I simply don’t grasp what the script is doing to the database.
For instance, one of the tasks Kevin had me perform recently was to create a new Filegroup on my practice StackOverflow database, and then move that Filegroup file to a new drive on my VM. I needed to do it through TSQL scripting, which meant I had to learn what scripts to use, but I also had to have a fundamental understanding of what was actually happening to my database. I couldn’t have one without the other.

Fourth, get familiar with the Script button in your various SQL Server windows and dialog boxes. Often, when you’re performing some sort of job or task in the SSMS GUI, the software provides a little button up in the toolbar that will show you the actual script that will run once you’ve chosen your options, set various parameters, and hit “OK.”
Hitting that Script button (when available) before you click “OK” will pop up a new query window in the background and populate it with the script will be run. Use this as a learning tool to see how the choices you made in the task window create specific commands within your script to be executed.
Fifth, find yourself some good tutorials. Though it’s pretty basic in a lot of ways—and teaches the ANSI standard alongside TSQL—I’ve found the SQL tutorial at W3 Schools to be an excellent training resource. I’ve spent a few hours using it just this week to reinforce what I already knew and take my TSQL knowledge to the next level.

What have you done to help yourself learn TSQL? I’d love to hear your own tips and tricks so I can try them out for myself.

That’s all for this week. Join me next time for The Ironic DBA Files—Episode 7: The Backup Awakens.

Follow me on Twitter at @SQLandMTB, and if you’re into mountain bikes come over and check out my site NTX Trails.

The Ironic DBA Files

Follow @Dallas_DBAs

The Ironic DBA Files—Episode 5: The Maintenance Plan Strikes Back

July 9, 2019 by SQLandMTB Leave a Comment

Welcome back to The Ironic DBA Files, a series where the newbie DBA on staff at Dallas DBAs chronicles the ups and downs of fast-tracking a new career as a DBA.

In the previous post I rambled about for a bit talking about primary keys and clustered index keys. That writeup was very tip-of-the-iceberg stuff and there’s much, much more ground to cover on that topic in the future. This week I circle back around again to touch on a topic I’ve already written about in the past: Maintenance Plans.

You’re Gonna Use That?

I’ve learned that Maintenance Plans have something of a bad rap among experienced DBAs, and I can partially see why. If you’re new to this game like me, let me give you the two main reasons I’ve pinpointed as to why a Senior DBA might tell you to not use Maintenance Plans.

Misconfiguration: Like so many others functions within SQL Server, it’s very easy for an Accidental DBA to choose the wrong options and parameters. I’m not going to run through best practices here, but simply knowing that you can screw up your database by incorrectly implementing a Maintenance Plan is something to be aware of.
Inferior Product: While SQL Server is itself an enterprise-level piece of software powerful enough to encourage companies and governments worldwide to use it for their data, let’s admit that it’s not the best at everything. When it comes to performing regular maintenance tasks on your server, there are better options out there if you know where to look.

Here’s the core issue with both of the complaints above—an inexperienced DBA doesn’t know any better. And as we’ve discussed before, so many DBAs are not trained, let alone full time. They were thrown into the server room by the boss and told, “fix it!” just because they were the staff computer geek.

If an Accidental DBA digs just a little bit into the workings of SQL Server they’ll discover the Maintenance Plan Wizard and think, “Hey! This is easy!” The problem is that the Wizard assumes you know what every available option does and how it affects your data. They also don’t have the time or experience necessary to understand how to properly implement a third-party set of scripts. Think about it, if a new DBA doesn’t understand how to properly implement a built-in set of maintenance scripts, how can they be expected to implement a more complex third-party product?

Like me, a new DBA is going to look at the Maintenance Plan Wizard their first time out and see the shrink option and mistake it for compression. Compression the database is a good thing, right? So shrinking my database is a great idea! Making a mistake like this can create a database that performs horribly, and the Wizard makes no attempt to prevent you from making poor choices. Even newer versions of SQL Server with an updated Maintenance Plan Wizard don’t help a junior or apprentice DBA understand what the options mean.

Brent Ozar (b|t) said, “If maintenance plans are designed for the accidental and junior DBAs amongst us, I don’t think a lot of this stuff [options included in the Maintenance Plan Wizard] should even be an option. It should just default to the right thing, and take care of the database with Microsoft’s best practices set up as standard.” SQL Server 2016 Maintenance Plan Improvements.

So here’s my attempt to help my fellow apprentice DBAs better understand how to implement a rational set of Maintenance Plans that won’t simply end up making things worse.

Maintenance Plans for Apprentice DBAs

First of all, the best thing you can do for yourself is to go and learn what the various options in the Maintenance Plan Wizard can do, and what implementation of those options will do to your database.

Second, one of the very first things I learned from Kevin is that you should separate all the different Maintenance Plan options you intend to implement into separate tasks. Simply put, while it’s easy within the Wizard to throw together one plan that does an integrity check, an index rebuild, and a full backup, it’s always better to keep them separate and scheduled during appropriate maintenance windows.

Third, related specifically to backups, another guideline Kevin shared early on is that any server in production should be using the FULL recovery model. Why? Because otherwise you can’t perform Transaction Log backups, which means you lose the ability to apply point-in-time restores.

If you don’t know this already, a log backup truncates the transaction log when using either the full or bulk-logged recovery models. A regular log backup helps keep your transaction log from growing uncontrollably, allows you to perform the aforementioned point-in-time restores, and limits data loss during database recovery.

Other quick points:

Perform Check Database Integrity Often: Do this DAILY if possible, but at least WEEKLY. Always schedule this during a wide maintenance window if possible. (Read more about CheckDB back in Episode 2.)
Learn the Difference Between Rebuild and Reorganize Index: It sounds simple, but this is an instance where you need to understand a bit about what’s going under the hood of SQL Server and how rebuild and reorganize are not the same thing. One of the chief points to understand and learn more about is the fact that rebuilding indexes automatically triggers updating of statistics, while reorganization does not.
Create and Schedule Backups to Meet RPO and RTO Standards: If you’re working on a server in production, don’t just throw together a set of backups and call it a day. You need to know what your company/client expects in terms of acceptable data loss and down time. Your set of Maintenance Plan backups must be crafted to fit these expectations.
Never Use Shrink Database: NEVER. Perform this manually on demand when necessary. Check out Paul Randal’s (b|t) in-depth look at why you should not shrink your data files.
Learn About the Other Tasks: There are three other major tasks you can perform with a Maintenance Plan—Maintenance Cleanup, Execute SQL Server Agent Jobs, and History Cleanup. I won’t cover these here, but go and learn about them once you’ve got a handle on the other steps above.

That’s all for this week. Join me next time for The Ironic DBA Files—Episode 6: Return of the TSQL.

Follow me on Twitter at @SQLandMTB, and if you’re into mountain bikes come over and check out my site NTX Trails.

The Ironic DBA Files

Follow @Dallas_DBAs

The Ironic DBA Files—Episode 4: A New Primary Key

July 2, 2019 by SQLandMTB Leave a Comment

Welcome back to The Ironic DBA Files, a series where the newbie DBA on staff at Dallas DBAs chronicles the ups and downs of fast-tracking a new career as a DBA.

Last time we took a look at some nomenclature related to SQL Server and touched on the basic differences between clustered and nonclustered indexes.

I’ll be honest here in this post—I’m worn out—but it’s the good kind of tiredness. Those of you who have been DBAs for a year or more probably don’t realize just how much knowledge you’ve absorbed and put into practice since you started. And those who are Senior DBAs…well…your knowledge can often look like magic to the uninitiated apprentice. (Insert Gandalf image here.)

I’m deep enough into the weeds that I’m finding it a bit hard to drill down to something specific to write about this week, but I’m going to give it my best try. Apologies in advance if this seems like a bit of a ramble.

The Keys to the Kingdom

Picking up a bit where I left off last week, let’s talk about Primary Keys.

Remember, a table is made up of columns and rows. Data in a database table is stored in the “cells,” or fields. Typically, tables in SQL Server have a column or set of columns that contain values to uniquely identify each row. This column/column set is known as the Primary Key.

A table can only have ONE Primary Key, though—as mentioned above—it is possible that a Primary Key can consist of either single or multiple columns.

How do you know if your table has a Primary Key? Well, if you created the table yourself and haven’t built a clustered index on that table yet, your table probably doesn’t have a Primary Key unless you created a primary key at the time of table creation.

Here’s a quick bit of code that shows how you might create a new table with a primary key constraint.

-- Create table with primary key constraint
CREATE TABLE Projects
(
    id INT NOT NULL,
    name VARCHAR(255),
    start DATE NOT NULL,
    end DATE NOT NULL,
    CONSTRAINT PK_Projects_name PRIMARY KEY (name)
);

The above snippet of T-SQL creates a table named “Projects” with four columns: id, name, start, and end. The project_name column is defined in the code snippet as the table’s primary key constraint.

Now, what you need to realize at this point is that as soon as a Primary Key is created on a table, that table is now a clustered index. The Primary Key is now also a clustered index key.

Note: The above column names are really not all that great, but are kept simple for illustrative purposes to my fellow apprentices.

I Need a Different Clustered Index Key

What you may or may not recognize right away is that a table’s Primary Key might not be the best value to build a clustered index upon. Hopefully you’ll have been able to put some thought into the process of building a new database’s architecture and have a chance to set keys up as best you can from the start.

Even if you get that opportunity, over time you may discover that user’s queries would be better served with a different indexing structure—which leads you to figure out how to drop a Primary Key in order to build a new clustered index.

Before we move on, let’s take a moment to consider what makes a great clustered index key. [Hat tip to the highly recommended Pluralsight Course SQL Server: Indexing for Performance by Kimberly L. Tripp. (b|t)]

Characteristics of a good clustered index key:

Unique—Value is not duplicated elsewhere.
Static—Value will not change.
Narrow—Value is as specific as possible.
Non-nullable—Value cannot be NULL.
Fixed width—Value of the key has been assigned a fixed width.
Ever-increasing—Value increases over time.

So, what happens if sometime in the future you determine the current Primary Key isn’t the best choice for your clustered index? You have to perform two T-SQL commands. You have to drop the primary key constraint and then build a new clustered index.

--Drop constraint
ALTER TABLE Projects
DROP CONSTRAINT PK_Projects_name
GO

-- Create clustered index
CREATE CLUSTERED INDEX IX_Projects_id
     ON dbo.Projects (id)
GO

HT: Pinal Dave (b|t)

Go ahead and test all the above code snippets out on your test instance. Wait, you don’t have a test instance?

Guess what? You can set up a inexpensive test instance by setting up an Azure account and creating a VM. Just be sure to pay attention to how you structure your resources and how long you keep it running each day. There’s no need to go crazy and create a VM that’s an uber-computer and leave it running 24/7. That’s a quick way to go bankrupt.

That’s all for this week. Join me next time for The Ironic DBA Files—Episode 5: The Maintenance Plan Strikes Back.

Follow me on Twitter at @SQLandMTB, and if you’re into mountain bikes come over and check out my site NTX Trails.

The Ironic DBA Files

Follow @Dallas_DBAs

The Ironic DBA Files—Episode 3: Revenge of the Index

June 25, 2019 by SQLandMTB Leave a Comment

Welcome back to The Ironic DBA Files, a series where the newbie DBA on staff at Dallas DBAs chronicles the ups and downs of fast-tracking a new career as a DBA.

Last week I shared the latest I had learned on three main subjects—DBCC CheckDB, Normalization, and SQL Server Security. Since that last post, while continuing my studies and thinking about what to write this week, I realized that I need to take a few steps backwards in my explanations of things.

You see, part of the purpose of this series is to not only share my triumphs and struggles in a way that hopefully encourages other new DBAs, but to also explain concepts in a friendly, non-threatening manner along the way. So, that means we’re going to take a couple of steps back before we move forward.

Can I Set Your Table for You?

fancy restaurant dinner table — Nope, not this kind of table…

The problem with tech-related jargon is that we often assume others automatically understand what we’re talking about when we use it. I haven’t been in the SQL game very long but I’m becoming more and more comfortable with the nomenclature. That doesn’t mean that everyone starting out on their path to becoming a DBA understands even the most basic terms. Let’s look at a few.

I’ve found it best to think about the parts of a SQL Server database in terms of a top-down hierarchy. So this also means we’re not going to start with the actual database, but a level above. Let’s look at instances.

An instance specifically refers to an instance of the Database Engine—the part of the software that makes SQL Server work. According to Microsoft Docs, “An instance of the Database Engine is a copy of the sqlservr.exe executable that runs as an operating system service. Each instance manages several system databases and one or more user databases. Each computer can run multiple instances of the Database Engine. Applications connect to the instance in order to perform work in a database managed by the instance.”

Did you catch that? Not only can you install multiple instances onto a server—in fact you can run different versions such as 2008, 2016, and 2017 all at the same time—but the instance manages the databases. Though not completely technically correct, you could visualize instances as holding, or containing, the databases.

In SQL Server, the databases are relational, meaning that the databases are structured in a way to recognize relationships among stored items of information. How are these items of information stored? In a table.

You can conceptualize a table like a spreadsheet. The data contained within “cells” stored in rows and columns. When the database engine performs a search query it filters through the available data by searching through the table columns row-by-row.

Though the stored data is held in a table configuration, SQL Server doesn’t physically store everything in a bunch of individual spreadsheet files like Excel. SQL Server actually stores the data row-by-row in 8K pages. The number of rows that can be stored on each page depends on how much data is contained in each row. There’s a bit more to the architecture of SQL Server pages, but grasping this much is sufficient for now. (In the future we’ll also talk about extents.)

When a table is indexed—we’ll talk about indexes below—the pages are arranged in a B-tree structure as illustrated below. We’re not going to go too deep into this for now, but take a look at this diagram to get a basic sense of index structure.

Above you’ll see a few new terms to learn. We’re not going to go too deep here, just understand that the hierarchical structure you see above is an illustration of a clustered index. The pages in this illustration are physically organized by a logical numerical order. The top root level is an 8K page that points to data references in the top-most intermediate level. The intermediate levels consist of more 8K pages that continue directing SQL Server downward to the appropriate page on the bottom level. This bottom level is called the leaf level and it’s where all of the table’s data is stored row-by-row in 8K pages.

Don’t worry about B-tree structure too much at this point in your career. Just keep this diagram handy as you move forward and refer to it as necessary to help you visualize index structure.

How Do I Find Your Phone Number?

White Pages — Look at this ancient technology!

To understand indexes, let’s do a little more basic terminology. You’re going to hear the terms logical and physical often in relation to explaining how your data is structured. So what do those terms mean?

Let’s throw out the technical terms here in an effort to solidify the distinction. Remember above where I talked about your data being stored in 8K pages? Let’s imagine you’re carrying a stack of papers, each printed with hundreds of names listed in alphabetical order—your data. If you trip and fall, scattering these papers all over the floor, how will you approach picking them up?

If you’re in a hurry to get somewhere, you’re unlikely to spend the time necessary to put the sheets in alphabetical order. You’d just randomly pick up the pages and get on your way. Whatever order you picked up the papers becomes the physical order of your data. Keep in mind that the data on each individual page hasn’t changed, but now the stack is completely jumbled and out of order. There’s still a logical order, but the pages are not arrayed in a manner where the physical order matches that logical order. In SQL Server terms, this is called a heap.

Now, if you took the time to pick up the papers and organize the entire stack alphabetically, you’d be creating an index. In SQL Server, when you create a clustered index, ALL of the data in your table is reordered so that the physical order of your data matches the logical order. This logical order is keyed to a specific column (or group of columns) and may be alphabetical, numerical, or based on another sort of hierarchical order. (Refer back to the B-tree diagram above for a visualization of this physical ordering.)

Indexes are often explained to new DBAs in analogies related to physical books. Those analogies are great, but let’s make sure you get this down from the start. There are two types of indexes, clustered and nonclustered.

Don’t get lost in the weeds here. As a new DBA here’s what you need to know: Every table in your database should have a clustered index, and each table may have only one clustered index. Why?

It’s best to visualize a clustered index as if it were an old telephone book—specifically the white pages where every resident in town is listed within the phonebook’s pages in alphabetical order by last name. Such a book could only have one order because it is the sole copy of the data.

When you create a clustered index for a table it BECOMES the data. It IS the data reorganized by a specified column. This was a gap in my understanding for a little while. At first I thought that a clustered index, when created, was just simply an additional file that pointed to data locations within the table. This is not the case. A clustered index becomes the data, or rather it IS the data organized in a specific way.

However, a nonclustered index is a file that simply contains pointers to the data in the table. A nonclustered index is more like the index in the back of a book that is organized by subject. This type of index doesn’t hold the actual data but rather tells you where to find a certain type of data more quickly. Tables in SQL Server can have many nonclustered indexes to help the Database Engine perform search queries more efficiently.

This doesn’t mean you want to add dozens and dozens of indexes in an attempt to cover every query. Check out this great video from Pinal Dave (b|t) where he proves that the presence of just one additional unused index can actually significantly slow down a query. This is really beyond the scope of our current discussion, but I thought it was worth pointing out. You’ll probably hear more than once in your career something like, “Tables in SQL Server can have 249 nonclustered indexes. This is limit, not a goal.”

Just because something’s possible doesn’t mean it’s a good idea.

That’s all for this week. Join me next time for The Ironic DBA Files—Episode 4: A New Primary Key.

Follow me on Twitter at @SQLandMTB, and if you’re into mountain bikes come over and check out my site NTX Trails.

The Ironic DBA Files

Follow @Dallas_DBAs