The Ironic DBA Files—Episode 3: Revenge of the Index

Welcome back to The Ironic DBA Files, a series where the newbie DBA on staff at Dallas DBAs chronicles the ups and downs of fast-tracking a new career as a DBA.

Last week I shared the latest I had learned on three main subjects—DBCC CheckDB, Normalization, and SQL Server Security. Since that last post, while continuing my studies and thinking about what to write this week, I realized that I need to take a few steps backwards in my explanations of things.

You see, part of the purpose of this series is to not only share my triumphs and struggles in a way that hopefully encourages other new DBAs, but to also explain concepts in a friendly, non-threatening manner along the way. So, that means we’re going to take a couple of steps back before we move forward.

Can I Set Your Table for You?

fancy restaurant dinner table — Nope, not this kind of table…

The problem with tech-related jargon is that we often assume others automatically understand what we’re talking about when we use it. I haven’t been in the SQL game very long but I’m becoming more and more comfortable with the nomenclature. That doesn’t mean that everyone starting out on their path to becoming a DBA understands even the most basic terms. Let’s look at a few.

I’ve found it best to think about the parts of a SQL Server database in terms of a top-down hierarchy. So this also means we’re not going to start with the actual database, but a level above. Let’s look at instances.

An instance specifically refers to an instance of the Database Engine—the part of the software that makes SQL Server work. According to Microsoft Docs, “An instance of the Database Engine is a copy of the sqlservr.exe executable that runs as an operating system service. Each instance manages several system databases and one or more user databases. Each computer can run multiple instances of the Database Engine. Applications connect to the instance in order to perform work in a database managed by the instance.”

Did you catch that? Not only can you install multiple instances onto a server—in fact you can run different versions such as 2008, 2016, and 2017 all at the same time—but the instance manages the databases. Though not completely technically correct, you could visualize instances as holding, or containing, the databases.

In SQL Server, the databases are relational, meaning that the databases are structured in a way to recognize relationships among stored items of information. How are these items of information stored? In a table.

You can conceptualize a table like a spreadsheet. The data contained within “cells” stored in rows and columns. When the database engine performs a search query it filters through the available data by searching through the table columns row-by-row.

Though the stored data is held in a table configuration, SQL Server doesn’t physically store everything in a bunch of individual spreadsheet files like Excel. SQL Server actually stores the data row-by-row in 8K pages. The number of rows that can be stored on each page depends on how much data is contained in each row. There’s a bit more to the architecture of SQL Server pages, but grasping this much is sufficient for now. (In the future we’ll also talk about extents.)

When a table is indexed—we’ll talk about indexes below—the pages are arranged in a B-tree structure as illustrated below. We’re not going to go too deep into this for now, but take a look at this diagram to get a basic sense of index structure.

Above you’ll see a few new terms to learn. We’re not going to go too deep here, just understand that the hierarchical structure you see above is an illustration of a clustered index. The pages in this illustration are physically organized by a logical numerical order. The top root level is an 8K page that points to data references in the top-most intermediate level. The intermediate levels consist of more 8K pages that continue directing SQL Server downward to the appropriate page on the bottom level. This bottom level is called the leaf level and it’s where all of the table’s data is stored row-by-row in 8K pages.

Don’t worry about B-tree structure too much at this point in your career. Just keep this diagram handy as you move forward and refer to it as necessary to help you visualize index structure.

How Do I Find Your Phone Number?

White Pages — Look at this ancient technology!

To understand indexes, let’s do a little more basic terminology. You’re going to hear the terms logical and physical often in relation to explaining how your data is structured. So what do those terms mean?

Let’s throw out the technical terms here in an effort to solidify the distinction. Remember above where I talked about your data being stored in 8K pages? Let’s imagine you’re carrying a stack of papers, each printed with hundreds of names listed in alphabetical order—your data. If you trip and fall, scattering these papers all over the floor, how will you approach picking them up?

If you’re in a hurry to get somewhere, you’re unlikely to spend the time necessary to put the sheets in alphabetical order. You’d just randomly pick up the pages and get on your way. Whatever order you picked up the papers becomes the physical order of your data. Keep in mind that the data on each individual page hasn’t changed, but now the stack is completely jumbled and out of order. There’s still a logical order, but the pages are not arrayed in a manner where the physical order matches that logical order. In SQL Server terms, this is called a heap.

Now, if you took the time to pick up the papers and organize the entire stack alphabetically, you’d be creating an index. In SQL Server, when you create a clustered index, ALL of the data in your table is reordered so that the physical order of your data matches the logical order. This logical order is keyed to a specific column (or group of columns) and may be alphabetical, numerical, or based on another sort of hierarchical order. (Refer back to the B-tree diagram above for a visualization of this physical ordering.)

Indexes are often explained to new DBAs in analogies related to physical books. Those analogies are great, but let’s make sure you get this down from the start. There are two types of indexes, clustered and nonclustered.

Don’t get lost in the weeds here. As a new DBA here’s what you need to know: Every table in your database should have a clustered index, and each table may have only one clustered index. Why?

It’s best to visualize a clustered index as if it were an old telephone book—specifically the white pages where every resident in town is listed within the phonebook’s pages in alphabetical order by last name. Such a book could only have one order because it is the sole copy of the data.

When you create a clustered index for a table it BECOMES the data. It IS the data reorganized by a specified column. This was a gap in my understanding for a little while. At first I thought that a clustered index, when created, was just simply an additional file that pointed to data locations within the table. This is not the case. A clustered index becomes the data, or rather it IS the data organized in a specific way.

However, a nonclustered index is a file that simply contains pointers to the data in the table. A nonclustered index is more like the index in the back of a book that is organized by subject. This type of index doesn’t hold the actual data but rather tells you where to find a certain type of data more quickly. Tables in SQL Server can have many nonclustered indexes to help the Database Engine perform search queries more efficiently.

This doesn’t mean you want to add dozens and dozens of indexes in an attempt to cover every query. Check out this great video from Pinal Dave (b|t) where he proves that the presence of just one additional unused index can actually significantly slow down a query. This is really beyond the scope of our current discussion, but I thought it was worth pointing out. You’ll probably hear more than once in your career something like, “Tables in SQL Server can have 249 nonclustered indexes. This is limit, not a goal.”

Just because something’s possible doesn’t mean it’s a good idea.

That’s all for this week. Join me next time for The Ironic DBA Files—Episode 4: A New Primary Key.

Follow me on Twitter at @SQLandMTB, and if you’re into mountain bikes come over and check out my site NTX Trails.