Wednesday 14 December 2016

A bit more on SQL Server alerting


If you ran the creation script from the previous post you’ll now have a group of alerts under SQL Agent that in Management Studio, they will look a bit like this:
The most important thing to remember about alerts is that no matter what, they will only ever be as effective as the response so this is where a sensible approach to alerting is required.
But what do I mean by sensible? Typically I see a number of problems with alerting setups; either alerts are inadequate and don’t cover the necessary errors (or there are none at all) but I also see the notifications to alerts not being set up correctly meaning problems go backwards and forwards delaying any fixes.
The other problem I see is an over provision of alerts. This usually is because one or more other monitoring systems have been deployed and error notifications have been duplicated as a result. Imagine having an operational tool like System Centre, some SQL monitoring software and native alerting all pinging the same message to the one recipient mailbox. Now on top of that let’s say the alerts have not been configured correctly so information emails are being issued every second. It’s a scary thought but it is easy to see how a critical error might be missed in this scenario.
So how to resolve these types of problems?
In the first instance having the alerts set up from the last post will cover a number of errors, so that is good, what we need to do now is set up appropriate responses. Natively we do this by either setting the alert to notify an operator or we can use it to trigger a SQL Agent job.
Please note at this point we’re working on an assumption that database mail has been setup and the profile set in SQL Agent.
Under SQL Agent if we right click Operators and select New Operator the following screen is displayed (I’ve added a random email address in this example):
Here’s the T-SQL to create an operator:
EXEC msdb.dbo.sp_add_operator @name=N'DBA',
              @enabled=1,
              @weekday_pager_start_time=90000,
              @weekday_pager_end_time=180000,
              @saturday_pager_start_time=90000,
              @saturday_pager_end_time=180000,
              @sunday_pager_start_time=90000,
              @sunday_pager_end_time=180000,
              @pager_days=0,
              @email_address=N'database-management@company.com',
              @category_name=N'[Uncategorized]'
GO


To have this operator informed of any errors covered by alerts we click on Notifications and tick any alerts we want to notify the operator of. As the email address is used for potentially critical notifications I do not like to use a personal E-mail address, if that person is off or even leaves then this can obviously leave alerts unnoticed so I prefer to use a shared mailbox where team members can have access. Note: as part of a SQL audit check any alerts for invalid E-Mail addresses.
The above example shows the database management mailbox being assigned to all the alerts we had set up so we’ve achieved our first aim. We can also double click the alert in Management Studio, click Response and configure the operator like in the example below:
The T-SQL to setup a notification is as follows (notification_method 1 being E-mail)
USE msdb ;
GO

EXEC dbo.sp_add_notification
 @alert_name = N'Severity 017 - Insufficient Resource',
 @operator_name = N'DBA',
 @notification_method = 1 ;
GO
 
The next thing to think about is utilising different operators. I'm not saying to not include the DBA on alerts but there are certainly scenarios where you want other teams to be notified in the very first instance of an error happening.
Other teams may want to see performance alerts or user based errors perhaps, certainly in testing or development systems (another rule, YES, you should have alerts set up in non production). As mentioned before, the trick is to be relevant; development might not want to be notified of hardware based issues like an Error 825 alert but you can be certain infrastructure or storage teams would. Alternatively, perhaps some errors can be directed to a service desk as part of an incident handling process?
EDIT: On a side note take the same approach with SQL Agent job failures. If you find yourself forwarding on failure messages to other mailboxes then they need adding as an operator and including in the notification process!
Finally we need to cover the over alerting and for this I’ll use the Page Life Expectancy performance based alert. If you double click the alert in Management Studio and click Options; the following screen is displayed:
This allows us to include the alert error text in the notification E-mail and also add additional text to any message. My preference here is to add a brief description of the error and perhaps some next steps; Errors 823-825 should be followed up with a consistency check so it’s good to add this bit of advice to the alert.
Back to the over alerting bit; this screen also allows us to specify a delay between responses. For performance alerts you HAVE to do this, imagine if our Page Life Expectancy had dipped under the threshold for a considerable amount of time; we’d get alerted hundreds if not thousands of times and if a critical alert had occurred in the midst of the notifications it very easily could get missed. This makes setting an appropriate delay to the alert very important and it is dependent on both the type of alert and the instance.
The final point to go over is where different monitoring systems are used in addition to our SQL Agent alerts. In this case it is very easy to end up with two or three notifications for the same alert and whilst it’s nice to have all bases covered the reality is that it can be a real pain.
Overall there is no set practice and it is very dependent on what tools you already have installed. If you have SQL monitoring applications already running then I would leave the performance related alerts to those and perhaps other alerts like failed logins too. Things can get even more complicated if in addition to native and third party monitoring tools for SQL you also have a cross platform monitoring application like SCOM with the SQL monitoring included also. In this case, from a SQL point of view I tend to leave SCOM for the high level alerts; service and resource related errors, those kinds of things.
SQL’s native alerting functionality will give you a good practical layer of error reporting and if you don’t have the budget for third party tools this can be a god send. Whether you should buy third party tools is another question altogether. For busy multi-server estates where there are lots of tier 1 instances then I would say they are pretty much essential.
Overall there are a number of ways we can handle errors coming from SQL Server. They key to doing it successfully is making the most out of what resource we have available and using it sensibly. A carefully planned alerting setup with different operators/recipients is a very useful approach so think about who needs to know about the different error types and tailor your alerts accordingly. Configure alerts (from all sources) to not flood mailboxes and you'll have a seriously effective error handling process at your fingertips! 

No comments:

Post a Comment