Monday, 21 November 2016

TRACEWRITE and querying sys.traces


This post is about a wait type that I have seen being safely ignored by admins in the past but actually it does highlight an particular that I like to check up on in systems that I’m maintaining.

The wait type I’m referring to is TRACEWRITE and it’s easy to see why it often gets classed as being safe to ignore. SQL Server does run a default trace continuously so naturally this wait will show up in wait stat queries for your instances and be quite prominent (like the one below) but that said; whilst it’s relatively safe for the default trace to be ruled out of any further investigation what about traces that have been started by other users?



I typically see a scenario in non-production systems where a trace has been turned on to monitor a particular process for testing and then forgotten about and not stopped again once the process has finished. Tracing is not particularly cost friendly at the best of times so one running way beyond its expected duration is something you do want to know about.

The following query looks at what traces are currently running on the system:
SELECT
start_time,
stop_time,
[status],
[path],
is_default,
reader_spid
FROM sys.traces
 
This query returns information on all the current traces running on your system. It’s worth adding at this point that tracing to some degree (I still see tracing, a lot! has been replaced by extended events which are much more lightweight in operation, so another reason to be checking. Here's what a default trace looks like:


The columns returned are pretty self-explanatory, a value of 1 in the status column means the trace is currently running (0 = stopped) and you can use the reader_spid column to return information on who is using it, and who to ask about turning the trace off.


To summarise; in most cases trace related waits like TRACEWRITE are safe to ignore but it's good practice to routinely run a check on those systems where people might be running traces or you could even set up jobs that will automatically stop any traces outside of working hours.


One final point, it is very common for monitoring applications to sue some form of trace analysis, these could well be part of routine maintenance so prior to switching anything off ensure you know how the trace has originated and what its purpose is.

SQL Server 2016 - SP1: Warning!

Fresh on the heels of the news last week of Service Pack 1 being released for SQL Server 2016 comes a little word of warning. First the good news, SP1 has unlocked an awful lot of Enterprise functionality for the Standard, Web and even Express editions; more on that here. 


All of that is very good however a word of warning has been issued from the MSSQL Tiger Team if you are using SP1 on the "lower" version: "you might see some unforeseen errors or databases might even be left in suspect state after uninstallation of SQL Server 2016 SP1. Even worse would be if the system databases are using new features for example, partitioned table in master database, it can lead to SQL Server instance unable to start after uninstalling SQL Server 2016 SP1".


If you have installed SP1 and now wish to uninstall it you need to ensure that any of the new features have been disabled and/or dropped from the system. The blog post comes with a handy script to verify if any of the functionality is enabled. You can read all about it (and some other warnings) here on the MSSQL Tiger Team blog, oh and the uninstall works fine in Enterprise Edition.



Monday, 7 November 2016

What is a DMO?

DMO stands for Dynamic Management Object; these are a set of system objects stored within SQL that allow us to look into our instance and retrieve various bits of information about its current state - and a bunch of other things!


The actual objects are DMVs (Dynamic Management Views) or DMFs (Dynamic Management Functions). These are exactly the same as "normal" views or functions although they are held within the system schema.


These objects are used in a wide range of activities and I consider them to be the fundamental building blocks of any monitoring or tuning script or application. There is of course a vast number of DMO's available and over the course of writing this blog I will be putting together some introductory articles on the ones that I commonly use.


Here are some links to get you going:


Building a monitoring script with DMOs


Troubleshooting with Wait Statistics




More reading:


SQL Server: Working with Dynamic Management Objects (via TechNet)

Wednesday, 2 November 2016

Troubleshooting with Wait Statistics

Troubleshooting is a fundamental activity for any DBA and when it comes to SQL Server one of the most important tasks is analysing the wait statistics that have occurred within the instance.

So what are wait statistics? In short whenever a task is “waiting” its type of wait and duration are recorded by SQL Server. This information is incredibly useful and we can query DMV's (Dynamic Management Views) to obtain this information that we can use to aid our troubleshooting efforts.

One particular view is the sys.dm_os_wait_stats DMV. This is an aggregated view that shows us the waits that have been occurring in our instance since the last restart or when the wait statistics were cleared down. Queries like the one below are typically sorted by the wait_time_ms column to indicate the most prominent wait types that have occurred over time in our SQL server.

SELECT TOP 10 * FROM sys.dm_os_wait_stats
ORDER BY wait_time_ms DESC

Running this query without as it is will produce a lot of different wait types so it’s important to know that many of them are likely to be a result of completely normal operations that are being undertaken by SQL Server. This isn’t the problem with sys.dm_os_wait_stats that I am going to demonstrate today though so I will keep the queries very simplistic. What I want to show you is that although this view is great as a summary, it isn’t necessarily helpful when it comes to real-time analysis.

I’ll start by creating a new table in a non-production database on a non-production instance of SQL Server and insert a handful of rows.

CREATE TABLE [dbo].[tblPeople](
       [ID] [smallint] IDENTITY(1,1) NOT NULL,
       [FirstName] [varchar](50) NULL,
       [LastName] [varchar](50) NULL
) ON [PRIMARY]

GO

USE [TestingDatabase]
GO

INSERT INTO tblPeople (FirstName, LastName) VALUES ('Peter','Morgan');
INSERT INTO tblPeople (FirstName, LastName) VALUES ('Sarah','Jones');
INSERT INTO tblPeople (FirstName, LastName) VALUES ('Liam','Davies');
INSERT INTO tblPeople (FirstName, LastName) VALUES ('Joanne','Jenkins');

In order to remove any previous information and clear down the wait statistics we have to run the following DBCC command:

DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);

To force a wait in SQL I will create a blocking scenario by issuing an update command on our new table without committing or rolling back the transaction.

BEGIN TRAN

UPDATE tblPeople
SET LastName = 'Davies' WHERE LastName = 'Jenkins'

In another window (to stress, use another window) I will try to select data from that table knowing that I won’t be able to because it is being updated by another transaction. As it cannot acquire a lock my select query it should start to accumulate a LCK wait type.

SELECT TOP 1000 [ID]
      ,[FirstName]
      ,[LastName]
  FROM [TestingDatabase].[dbo].[tblPeople]

I’ll leave that query running for a few seconds and then look at sys.dm_os_wait_satas again and as mentioned previously I’ll order by wait_time_ms descending so I can see the worst offenders in my instance since the stats were cleared down. At this point I might expect that I will see a lock wait type appear very high in the list.

SELECT TOP 10 * FROM sys.dm_os_wait_stats
ORDER BY wait_time_ms DESC



wait_type
waiting_tasks_count
wait_time_ms
max_wait_time_ms
signal_wait_time_ms
SP_SERVER_DIAGNOSTICS_SLEEP
531
300000
299995
300000
LOGMGR_QUEUE
1109
142871
156
6
DIRTY_PAGE_POLL
1417
142842
122
16
SLEEP_SYSTEMTASK
9041
142832
54
98
HADR_FILESTREAM_IOMGR_IOCOMPLETION
285
142670
503
7
LAZYWRITER_SLEEP
150
142097
1008
5
SQLTRACE_INCREMENTAL_FLUSH_SLEEP
35
140008
4000
0
XE_TIMER_EVENT
54
140000
4999
140000
REQUEST_FOR_DEADLOCK_SEARCH
28
139990
5000
139990
QDS_SHUTDOWN_QUEUE
2
120000
60000
0

Odd, no lock waits! We know we have a blocked task because we’ve just created it but it isn’t showing here. Now this might be because my LCK wait type isn’t in the top ten waits based on duration, it’s unlikely, but we can alter our query to just return rows for the sort of wait type that we are after:

SELECT TOP 10 * FROM sys.dm_os_wait_stats
WHERE wait_type LIKE 'LCK%'
ORDER BY wait_time_ms DESC



wait_type
waiting_tasks_count
wait_time_ms
max_wait_time_ms
signal_wait_time_ms
LCK_M_UIX
0
0
0
0
LCK_M_SIX
0
0
0
0
LCK_M_SIU
0
0
0
0
LCK_M_IX
0
0
0
0
LCK_M_IU
0
0
0
0
LCK_M_IS
0
0
0
0
LCK_M_X
0
0
0
0
LCK_M_U
0
0
0
0
LCK_M_S
1
0
0
0
LCK_M_SCH_M
0
0
0
0

Here I can see one waiting task, I know in this test scenario its my task and that is the only thing running but ordinarily, because it's a cumulative value, it's difficult to pinpoint what is happening right now. It's also worth noting that although the waiting task count has incremented the wait time hasn't which begs the question, when do these wait times get populated?

In order to check I'm going to roll back the transaction by running the following command underneath the update statement (the same window this time).

ROLLBACK TRAN

Let’s run the query again to check our running totals now our query has finished and I'll leave the LCK type clause in there.

SELECT TOP 10 * FROM sys.dm_os_wait_stats
WHERE wait_type LIKE 'LCK%'
ORDER BY wait_time_ms DESC





wait_type
waiting_tasks_count
wait_time_ms
max_wait_time_ms
signal_wait_time_ms
LCK_M_S
1
420051
420051
0
LCK_M_SCH_M
0
0
0
0
LCK_M_SCH_S
0
0
0
0
LCK_M_UIX
0
0
0
0
LCK_M_SIX
0
0
0
0
LCK_M_SIU
0
0
0
0
LCK_M_IX
0
0
0
0
LCK_M_IU
0
0
0
0
LCK_M_IS
0
0
0
0
LCK_M_X
0
0
0
0


Now that the transaction has finished we can see that our cumulative values have been updated. I was initially surprised by this as I always considered the DMV to contain “running” totals but looking on the Books Online article on the DMV you will find the following line:

Returns information about all the waits encountered by threads that executed.

Executed; that would be past tense! As we’ve just demonstrated the duration values are only updated after a thread has finished executing. It does make sense as it is an aggregated view after all but as such it does raise an important point; using the sys.dm_os_wait_stats DMV is not the best option for real time troubleshooting.

In order to do that a bit better we have to use a different DMV and one at the task level rather than the summary view that sys.dm_os_wait_stats gives us. So let’s run the update and select query from our previous test again to create the same blocking scenario and after a few seconds run the following script:

SELECT session_id, wait_type, wait_duration_ms, blocking_session_id
FROM sys.dm_os_waiting_tasks
WHERE session_id > 50



This query selects from the sys.dm_os_waiting_tasks management view and shows the tasks that are currently waiting. Here I’m selecting the type of wait, the duration in milliseconds (that we know isn’t available in the sys.dm_os_wait_stats view in real-time) and finally I’ve included the blocking_session_id.





session_id
wait_type
wait_duration_ms
blocking_session_id
58
LCK_M_S
62790
56

Now we get a much better picture of which executing tasks are currently waiting, what the wait types are and importantly (and what we didn’t get before), for how long. You can refresh the queries results to see the duration increase whilst the select query is being locked.

We can now repeat the rollback transaction command (or commit it depending how optimistic you’re feeling) that we did previously and check to see that sys.dm_os_wait_stats has been updated accordingly.

So here are two different ways that we can use SQL Servers wait statistics for troubleshooting purposes. Both views give us really useful information but both have different purposes. If we wanted to look back over time then the sys.dm_os_wait_stats will give us a view of wait time totals. Typically we would capture the information via a scheduled job and analyse the data for spikes during periods where issues might be suspected.

For performing real-time analysis of wait statistics then we should base queries on the sys.dm_os_waiting_tasks view where we can see accurate wait duration values as they are happening within our instance.

In my opinion wait statistics are the most important piece of information when troubleshooting SQL Server so learning about the different types is vital for anyone using SQL. Thankfully there is a wealth of really useful information about wait statistics out there; I’ve listed some of my favourite links below.


More Reading.