Troubleshooting is a
fundamental activity for any DBA and when it comes to SQL Server one of the most
important tasks is analysing the wait statistics that have occurred within the
instance.
So what are wait statistics? In short
whenever a task is “waiting” its type of wait and duration are recorded by SQL
Server. This information is incredibly useful and we can query DMV's (Dynamic Management Views) to obtain this information that we can use to aid our troubleshooting efforts.
One particular view is the
sys.dm_os_wait_stats DMV. This is an aggregated view that shows us the waits
that have been occurring in our instance since the last restart or when the
wait statistics were cleared down. Queries like the one below are typically
sorted by the wait_time_ms column to indicate the most prominent wait types that
have occurred over time in our SQL server.
SELECT TOP 10 * FROM sys.dm_os_wait_stats
ORDER BY wait_time_ms DESC
Running this query without
as it is will produce a lot of different wait types so it’s important to know that
many of them are likely to be a result of completely normal operations that are
being undertaken by SQL Server. This isn’t the problem
with sys.dm_os_wait_stats that I am going to demonstrate today though so I will
keep the queries very simplistic. What I want to show you is that although this
view is great as a summary, it isn’t necessarily helpful when it comes to
real-time analysis.
I’ll start by creating
a new table in a non-production database on a non-production instance of SQL
Server and insert a handful of rows.
CREATE TABLE [dbo].[tblPeople](
[ID]
[smallint] IDENTITY(1,1) NOT NULL,
[FirstName]
[varchar](50) NULL,
[LastName]
[varchar](50) NULL
) ON [PRIMARY]
GO
USE [TestingDatabase]
GO
INSERT INTO tblPeople (FirstName, LastName) VALUES ('Peter','Morgan');
INSERT INTO tblPeople (FirstName, LastName) VALUES ('Sarah','Jones');
INSERT INTO tblPeople (FirstName, LastName) VALUES ('Liam','Davies');
INSERT INTO tblPeople (FirstName, LastName) VALUES ('Joanne','Jenkins');
In order to remove any
previous information and clear down the wait statistics we have to run the
following DBCC command:
DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);
To force a wait in SQL I will create a blocking
scenario by issuing an update command on our new table without committing or
rolling back the transaction.
BEGIN TRAN
UPDATE tblPeople
SET LastName = 'Davies' WHERE LastName = 'Jenkins'
In another window (to stress,
use another window) I will try to select data from that table knowing that I
won’t be able to because it is being updated by another transaction. As it cannot
acquire a lock my select query it should start to accumulate a LCK wait type.
SELECT TOP 1000 [ID]
,[FirstName]
,[LastName]
FROM [TestingDatabase].[dbo].[tblPeople]
I’ll leave that query
running for a few seconds and then look at sys.dm_os_wait_satas again and as
mentioned previously I’ll order by wait_time_ms descending so I can see the
worst offenders in my instance since the stats were cleared down. At this point
I might expect that I will see a lock wait type appear very high in the list.
SELECT TOP 10 * FROM sys.dm_os_wait_stats
ORDER BY wait_time_ms DESC
wait_type
|
waiting_tasks_count
|
wait_time_ms
|
max_wait_time_ms
|
signal_wait_time_ms
|
SP_SERVER_DIAGNOSTICS_SLEEP
|
531
|
300000
|
299995
|
300000
|
LOGMGR_QUEUE
|
1109
|
142871
|
156
|
6
|
DIRTY_PAGE_POLL
|
1417
|
142842
|
122
|
16
|
SLEEP_SYSTEMTASK
|
9041
|
142832
|
54
|
98
|
HADR_FILESTREAM_IOMGR_IOCOMPLETION
|
285
|
142670
|
503
|
7
|
LAZYWRITER_SLEEP
|
150
|
142097
|
1008
|
5
|
SQLTRACE_INCREMENTAL_FLUSH_SLEEP
|
35
|
140008
|
4000
|
0
|
XE_TIMER_EVENT
|
54
|
140000
|
4999
|
140000
|
REQUEST_FOR_DEADLOCK_SEARCH
|
28
|
139990
|
5000
|
139990
|
QDS_SHUTDOWN_QUEUE
|
2
|
120000
|
60000
|
0
|
Odd, no lock waits! We know we have a blocked
task because we’ve just created it but it isn’t showing here. Now this might be
because my LCK wait type isn’t in the top ten waits based on duration, it’s
unlikely, but we can alter our query to just return rows for the sort of wait type that we are
after:
SELECT TOP 10 * FROM sys.dm_os_wait_stats
WHERE wait_type LIKE 'LCK%'
ORDER BY wait_time_ms DESC
wait_type
|
waiting_tasks_count
|
wait_time_ms
|
max_wait_time_ms
|
signal_wait_time_ms
|
LCK_M_UIX
|
0
|
0
|
0
|
0
|
LCK_M_SIX
|
0
|
0
|
0
|
0
|
LCK_M_SIU
|
0
|
0
|
0
|
0
|
LCK_M_IX
|
0
|
0
|
0
|
0
|
LCK_M_IU
|
0
|
0
|
0
|
0
|
LCK_M_IS
|
0
|
0
|
0
|
0
|
LCK_M_X
|
0
|
0
|
0
|
0
|
LCK_M_U
|
0
|
0
|
0
|
0
|
LCK_M_S
|
1
|
0
|
0
|
0
|
LCK_M_SCH_M
|
0
|
0
|
0
|
0
|
Here I can see one waiting task, I know in this test scenario its my task and that is the only thing running but ordinarily, because it's a cumulative value, it's difficult to pinpoint what is happening right now. It's also worth noting that although the waiting task count has incremented the wait time hasn't which begs the question, when do these wait
times get populated?
In order to check I'm going to roll back the transaction
by running the following command underneath the update statement (the same
window this time).
ROLLBACK TRAN
Let’s run the query again to check our running
totals now our query has finished and I'll leave the LCK type clause in there.
SELECT TOP 10 * FROM sys.dm_os_wait_stats
WHERE wait_type LIKE 'LCK%'
ORDER BY wait_time_ms DESC
wait_type
|
waiting_tasks_count
|
wait_time_ms
|
max_wait_time_ms
|
signal_wait_time_ms
|
LCK_M_S
|
1
|
420051
|
420051
|
0
|
LCK_M_SCH_M
|
0
|
0
|
0
|
0
|
LCK_M_SCH_S
|
0
|
0
|
0
|
0
|
LCK_M_UIX
|
0
|
0
|
0
|
0
|
LCK_M_SIX
|
0
|
0
|
0
|
0
|
LCK_M_SIU
|
0
|
0
|
0
|
0
|
LCK_M_IX
|
0
|
0
|
0
|
0
|
LCK_M_IU
|
0
|
0
|
0
|
0
|
LCK_M_IS
|
0
|
0
|
0
|
0
|
LCK_M_X
|
0
|
0
|
0
|
0
|
Now that the transaction has finished we can
see that our cumulative values have been updated. I was initially surprised by
this as I always considered the DMV to contain “running” totals but looking on the
Books Online article on the DMV you will find the following line:
Returns
information about all the waits encountered by threads that executed.
Executed; that would be past tense! As we’ve
just demonstrated the duration values are only updated after a thread has finished executing. It does make sense as it is
an aggregated view after all but as such it does raise an important point; using
the sys.dm_os_wait_stats DMV is not the best option for real time
troubleshooting.
In order to do that a bit better we have to use
a different DMV and one at the task level rather than the summary view that
sys.dm_os_wait_stats gives us. So let’s run the update and select query from
our previous test again to create the same blocking scenario and after a few
seconds run the following script:
SELECT session_id, wait_type,
wait_duration_ms, blocking_session_id
FROM sys.dm_os_waiting_tasks
WHERE session_id > 50
This query selects from the
sys.dm_os_waiting_tasks management view and shows the tasks that are currently
waiting. Here I’m selecting the type of wait, the duration in milliseconds (that
we know isn’t available in the sys.dm_os_wait_stats view in real-time) and
finally I’ve included the blocking_session_id.
session_id
|
wait_type
|
wait_duration_ms
|
blocking_session_id
|
58
|
LCK_M_S
|
62790
|
56
|
Now we get a much better picture of which executing
tasks are currently waiting, what the
wait types are and importantly (and what we didn’t get before), for how long. You
can refresh the queries results to see the duration increase whilst the select
query is being locked.
We can now repeat the rollback transaction command
(or commit it depending how optimistic you’re feeling) that we did previously
and check to see that sys.dm_os_wait_stats has been updated accordingly.
So here are two different ways that we can use
SQL Servers wait statistics for troubleshooting purposes. Both views give us
really useful information but both have different purposes. If we wanted to
look back over time then the sys.dm_os_wait_stats will give us a view of wait time totals. Typically we would capture the information via a scheduled job and analyse the data for spikes during periods where issues might be suspected.
For performing real-time analysis of wait
statistics then we should base queries on the sys.dm_os_waiting_tasks view
where we can see accurate wait duration values as they are happening within our
instance.
In my opinion wait statistics are the most
important piece of information when troubleshooting SQL Server so learning
about the different types is vital for anyone using SQL. Thankfully there is a
wealth of really useful information about wait statistics out there; I’ve
listed some of my favourite links below.
More Reading.