Welcome back to The Ironic DBA Files, a series where the newbie DBA on staff at Dallas DBAs chronicles the ups and downs of fast-tracking a new career as a DBA.
Last time we took a quick look at getting started learning TSQL, and this week will keep it pretty basic as well. Let’s talk about indexes again.
Indexes Are Part of the Job
Again? Yep, indexing properly is really the soul of how SQL Server operates. There’s a lot of physical and logical processing going on under the SQL Server hood, but server performance can live or die depending on the quality of the indexes.
Don’t believe me? Take a few minutes to watch Brent Ozar (b|t) give a quick tutorial on how to use the sp_BlitzIndex to see just the tip of the iceberg of indexing issues. (I won’t spoil it, but some of the terms for indexing problems are hilarious.)
Before we go too far here, let’s not get lost in the weeds. I’m learning that much of a DBA’s job is figuring out why SQL Server is running slowly. There are many reasons that could cause a server to bog down—from hardware issues to poorly written queries—and bad indexing is just one of those reasons.
Also, let’s always keep in the back of our mind that our job is to protect the data, so make sure you’ve got a handle on the basics of DBCC CheckDB and backups. If you don’t, you’re not ready for indexes.
Check out this quote from Erin Stellato (b|t) on why indexing takes a back seat.
I consider consistency checks, often referred to as CHECKDB, as one of the next most important tasks for a DBA. And if you’ve been a DBA for a while, and if you know how much I love statistics, you might wonder why fragmentation and statistics take third place. Well, I can fix fragmentation and out-of-date/inaccurate statistics at any point. I can’t always “fix” corruption. https://www.sqlskills.com/blogs/erin/the-accidental-dba-day-13-of-30-consistency-checking/
By the way, if you haven’t read the Accidental DBA Series at sqlskills.com yet, what are you waiting for?
Indexes Need Love and Attention
I have a perception that the typical Accidental DBA—once they learn about the power of indexes—takes a “set it and forget it” approach. It might even be tempting to sell ourselves the line, “Well, a bad or out-of-date index is better than no index.”
The problem is the F-word. Yep, I’m going there. I’m going to use the F-word on the Dallas DBA’s site and hope I don’t get fired. (Fired is a completely different F-word.) Here it is:
Do you ever remember the need to defragment your computer’s hard drive? It was periodically necessary because, as your operating system created, deleted, or changed information on your computer, the data was often seemingly placed willy-nilly around the hard drive’s available space. A document you created might literally have its bits split up and stored on different sectors of your HD, meaning the computer had to search harder to find all of the disparate file fragments every time you wanted to open that document.
The same basic principle is true in SQL Server. Remember back to the very beginning of your SQL Server studies when you were working on learning the underlying structure of how everything worked? Let’s review.
If you recall, all of the data in SQL Server’s tables is stored in 8K pages. Every time data is modified in a table then data is modified in related indexes as well. (Yes, I’m oversimplifying.) This means that the underlying data on those 8K pages gets shuffled around as necessary—fragmentation.
According to the MS Docs, “Fragmentation exists when indexes have pages in which the logical ordering, based on the key value, does not match the physical ordering inside the data file. Heavily fragmented indexes can degrade query performance and cause your application to respond slowly, especially scan operations.”
So, what to do when your database is fragmented and begins running poorly? You’ve got two basic options—REBUILD and REORGANIZE—and you need to understand the differences between the two.
REORGANIZE: A REORGANIZE operation uses lower system resources than a REBUILD. It defragments the clustered and nonclustered indexes at the leaf level by physically reordering the pages to match the logical order of the leaf nodes. REORGANIZE also compacts the index pages based on the existing fill factor value.
REORGANIZE is performed online so table locks are not held. This means queries and modifications are not blocked during a REORGANIZE operation. REORGANIZE also does NOT update the statistics for the index, so you may need to consider an additional UPDATE STATISTICS operation.
REBUILD: A REBUILD operation completely drops and recreates indexes. This index rebuild removes fragmentation and compacts the pages based on a specified or existing fill factor setting. A REBUILD operation reorders the index rows in contiguous pages.
Since a REBUILD operation destroys and recreates indexes, it is a blocking procedure. Queries will be unable to access the index pages until the operation is complete. Because the REBUILD operation essentially recreates the index it must update the index statistics, so there is no additional need to perform an UPDATE STATISTICS operation.
Wait! Statistics? What’s that?
That, my friends, is a topic for a future discussion…
But wait! What’s this fill factor thing? (Gah! That’s two more F-words!)
Check out Brent Ozar’s explanation: “‘Fill factor’ is a setting for indexes in SQL Server. When you create or rebuild an index, you can tell SQL Server what percentage of each 8k data page used in the ‘leaf’ level of the index it should fill up.
In other words, if you set a 90% fill factor when you rebuild a clustered index, SQL Server will try to leave 10% of each leaf page empty. The empty space will be present on the page as it’s written on disk, and it’ll take up the same amount of space as it sits in memory.
By default, SQL Server uses a 100% fill factor and tries to fill up all the pages in indexes as close to full as it can. Depending on how many rows actually fit on the page.”
Does it seem like I’ve left a lot out that you should learn? Yes, I have. There’s only so much room in a single blog post and you owe it to yourself and your career to go do some digging on your own.
Hey, Senior DBAs! What are some issues related to this rebuild vs. reorganize discussion a newbie needs to know about? Tell us something you’ve learned that you wish you had known when you were a Junior. The comments await your wisdom.
That’s all for this week. Join me next time for The Ironic DBA Files—Episode 9: The Rise of the Clients.
Follow me on Twitter at @SQLandMTB, and if you’re into mountain bikes come over and check out my site NTX Trails.
Jeff Moden says
Thanks for the article. I appreciate anyone and everyone that takes the time to share knowledge.
Getting right to the point, you wrote…
“REORGANIZE also compacts the index pages based on the existing fill factor value.”
Do you actually know what that means? The tl:dr; meaning is that should should ALMOST Never use REORGANIZE because any index that is fragmenting will, in one form or another, be made to perpetually fragment by REORGANIZE. That also means that the “Best Practices” (do nothing until 5% logical fragmentation, reorg between 5 and 30% and rebuild at > 30%) that everyone has been using for their index maintenance for more than 2 decades is totally wrong and is actually a VERY WORST PRACTICE. Books Online has all the warnings but a couple of words used incorrectly convinced the world of the wrong thing. Even the original author of “the numbers”, Mr. Paul Randal, says so.
How bad is it? How about if I told you that if you learn to maintain your Random GUID indexes correctly, that you can insert 100,000 rows per day for 58 days into a Random GUID index before the index hits 1% fragmentation and when it hits that very low amount of fragmentation, you MUST rebuild it and NOT reorganize it to get another couple of months of fragmentation free ops out of the index?
Yeah… I know what you’re thinking… words of a total lunatic, right? It all violates everything anyone has ever taught about indexes and Random GUIDs, right? It’s just too heterodoxical to be anywhere close to the truth, right?
Heh… while I’ll admit to being a bit of a madman when it comes to SQL Server (I AM the guy that coined the term RBAR), I’m not so mad as to make such a claim without absolute proof. Please take the hour and 20 minutes to watch the following presentation to the end. I’m sure you’ll recognize the sponsor, as well. 😉