Sunday 16 July 2017

A question on index and statistic columns.

This post is a question around how SQL Server creates statistics for a new index or in other words - how can the columns for an index and it's statistics be the opposite away round from one another?!

Here's what I originally found when having a poke around a database; it's a pretty basic clustered index (with names blanked out to protect the innocent).


We can see that the leading column is a varchar(20) type and the next one is varchar(50). Now let's have a look at the statistics for this particular index, just for info this is the only index on the table.


This time the the leading column is the varchar(50) which is then followed by the varchar(20) column, hmm. Now column order is pretty important and interestingly enough the varchar(50) column is actually the more selective of the two so I wondered if this why perhaps the statistics are in a different order. 

In order to test this I've used an old SQL 2014 test database that's been hanging around on a dev instance of mine. It has a very simple table composing of an ID field, first name and last name. Incidentally the first name and last name columns are varchar fields with lengths of 20 and 50.

Here's a new clustered index based on the first and last name columns:


After I have created it (yeah I know the name sucks btw) I'll check the statistics:


That seems fine, or at least the order is the same as the index, which we'd expect. 


Now lets recreate the index but modify the column order so it looks like this, with the last name now the leading column:


Now if I check the statistics...


They are in a different order to how I have just defined my index. 

Now originally I did wonder if the statistics had been manually altered, however just to rule that out if you try to change the columns of statistics in SSMS, you get the following:


Now of course this is with a clustered index, what happens if I try the same with a non clustered index?


Here is my new index where once again I have altered the column order. The statistics this time look like this:


Okaaay, this time the statistics reflect the column order of the non clustered index that I've just created. This makes (at least in this example), the statistic creation process different between a clustered and non-clustered index.

So the question, why the difference? Has SQL Server decided on the best column order for statistics for the clustered and non clustered indexes or has the creation process for the clustered index just not picked up on the column modification or does it even use another method when creating stats?