What are duplicate indexes, and why do I care?
This is an entry level explanation, with an analogy for new DBAs.
Duplicate indexes are those that exactly match the Key and Included columns. That’s easy.
Possible duplicate indexes are those that very closely match Key/Included columns.
Why do you care?
Indexes have to be maintained. When I say that, most people immediately think of Reorganizing, rebuilding and updating statistics, and they are not wrong.
But..don’t overlook the updates that happen to indexes when the data changes in the columns they are based on. If you have a duplicate index and you add, change or delete a row…BOTH of the indexes are changed. This takes CPU, memory and log space to do. Magnify by multiple indexes across a databases with tables that have millions or billions of rows and you start feeling this effort.
Consider the following two indexes:
-- DisplayName only CREATE INDEX [NC_DisplayName] ON [dbo].[Users] ([DisplayName] ASC) GO --DisplayName plus additional info CREATE INDEX [NC_DisplayName_Includes] ON [dbo].[Users] ([DisplayName] ASC) INCLUDE ([Reputation],[CreationDate],[UpVotes],[Views]) GO
If you query the StackOverflow database for me (Kevin3NF) using
Select DisplayName From Users Where DisplayName = 'Kevin3NF'
It will use the first index *
If you add ‘Upvotes’ to the Select, it will use the second index.
So…how is this a duplicate index?
The Key column (DisplayName) is the same.
Drop the NC_DisplayName index, and your first query will use the second index because even though there are 4 additional columns, the Key column is still there and this index is better than a table scan.
You also get the benefit of not having to update NC_DisplayName any time data is changed.
An Analogy of Two Doors:
You have two doors:
The First door is simple. It has one aspect to it…a handle.
The Second door has a handle, as well as all sorts of Included extras.
Both doors lead to the same place. Assume some people like the simplicity of Door1 and some people really want to go through the Steampunk style Door2. Both get what they want (destination), by picking the door that suits them.
If you take away Door1 (Drop Index), the folks that prefer it can simply go through the fancy Door2 and get where they needed to be. Door2 has included extras (columns) that the “Simple door” folks just ignore.
It really is that simple.
But…don’t go make an index with all possible columns! That’s a whole different kind of bad indexing.
Now…this does not mean you need to go blindly dropping indexes. Research, test, verify which indexes are being used/unused, etc. Definitely don’t go and drop a Primary Key index just because its a possible duplicate.
I use sp_BlitzIndex for my initial info gathering when doing HealthChecks, Index Tuning, etc. Its free and solid.
* I have run into a weird issue I am trying to sort out. In my testing, it appears that SQL Server might be basing its index decisions on the Index Creation date, all other things being equal. I will update as I find out more. This was done on SQL 2016, SP2 with the StackOverflow public data dump.
Per my good friend Pinal Dave (b|t), this is a known behavior: ” if two index has same key columns the one created later is used.”
Go forth and research!
Thanks for reading!