If you ran the creation script from the previous post you’ll
now have a group of alerts under SQL Agent that in Management Studio, they will look
a bit like this:
The most important thing to remember about alerts is that
no matter what, they will only ever be as effective as the response so this is
where a sensible approach to alerting is required.
But what do I mean by sensible? Typically I see a number
of problems with alerting setups; either alerts are inadequate and don’t cover
the necessary errors (or there are none at all) but I also see the notifications to alerts
not being set up correctly meaning problems go backwards and forwards delaying any fixes.
The other problem I see is an over provision of alerts.
This usually is because one or more other monitoring systems have been deployed
and error notifications have been duplicated as a result. Imagine having an
operational tool like System Centre, some SQL monitoring software and native
alerting all pinging the same message to the one recipient mailbox. Now on top
of that let’s say the alerts have not been configured correctly so information
emails are being issued every second. It’s a scary thought but it is easy to
see how a critical error might be missed in this scenario.
So how to resolve these types of problems?
In the first instance having the alerts set up from the
last post will cover a number of errors, so that is good, what we need to do now is
set up appropriate responses. Natively we do this by either setting the alert
to notify an operator or we can use it to trigger a SQL Agent job.
Please
note at this point we’re working on an assumption that database mail has been
setup and the profile set in SQL Agent.
Under SQL Agent if we right click Operators and select
New Operator the following screen is displayed (I’ve added a random email
address in this example):
Here’s the T-SQL to create an operator:
EXEC msdb.dbo.sp_add_operator @name=N'DBA',
@enabled=1,
@weekday_pager_start_time=90000,
@weekday_pager_end_time=180000,
@saturday_pager_start_time=90000,
@saturday_pager_end_time=180000,
@sunday_pager_start_time=90000,
@sunday_pager_end_time=180000,
@pager_days=0,
@email_address=N'database-management@company.com',
@category_name=N'[Uncategorized]'
GO
To have this operator informed of any errors covered by
alerts we click on Notifications and tick any alerts we want to notify the
operator of. As the email address is used for potentially
critical notifications I do not like to use a personal E-mail address, if that person
is off or even leaves then this can obviously leave alerts unnoticed so I prefer to
use a shared mailbox where team members can have access. Note: as part of a SQL audit check any alerts for invalid E-Mail addresses.
The above example shows the database management mailbox being
assigned to all the alerts we had set up so we’ve achieved our first aim. We
can also double click the alert in Management Studio, click Response and
configure the operator like in the example below:
The T-SQL to setup a notification is as follows (notification_method 1 being
E-mail)
USE msdb ;
GO
EXEC dbo.sp_add_notification
@alert_name = N'Severity 017 - Insufficient Resource',
@operator_name =
N'DBA',
@notification_method =
1 ;
GO
The next thing to think about is utilising different
operators. I'm not saying to not include the DBA on alerts but there are certainly scenarios where you want other teams to be notified in the very first instance of an error happening.
Other teams may want to see performance alerts or user based errors perhaps, certainly in testing or development systems (another rule, YES, you should have alerts set up in non production). As
mentioned before, the trick is to be relevant; development might not want to be
notified of hardware based issues like an Error 825 alert but you can be certain
infrastructure or storage teams would. Alternatively, perhaps some errors can
be directed to a service desk as part of an incident handling process?
EDIT: On a side note take the same approach with SQL
Agent job failures. If you find yourself forwarding on failure messages to
other mailboxes then they need adding as an operator and including in the
notification process!
Finally we need to cover the over alerting and for this I’ll
use the Page Life Expectancy performance based alert. If you double click the
alert in Management Studio and click Options; the following screen is
displayed:
This allows us to include the alert error text in the
notification E-mail and also add additional text to any message. My preference
here is to add a brief description of the error and perhaps some next steps;
Errors 823-825 should be followed up with a consistency check so it’s good to
add this bit of advice to the alert.
Back to the over alerting bit; this screen also allows us to
specify a delay between responses. For performance alerts you HAVE to do this,
imagine if our Page Life Expectancy had dipped under the threshold for a
considerable amount of time; we’d get alerted hundreds if not thousands of
times and if a critical alert had occurred in the midst of the notifications it
very easily could get missed. This makes setting an appropriate delay to the
alert very important and it is dependent on both the type of alert and the
instance.
The final point to go over is where different monitoring
systems are used in addition to our SQL Agent alerts. In this case it is very
easy to end up with two or three notifications for the same alert and whilst it’s
nice to have all bases covered the reality is that it can be a real pain.
Overall there is no set practice and it is very dependent
on what tools you already have installed. If you have SQL monitoring
applications already running then I would leave the performance related alerts
to those and perhaps other alerts like failed logins too. Things can get even
more complicated if in addition to native and third party monitoring tools for
SQL you also have a cross platform monitoring application like SCOM with the
SQL monitoring included also. In this case, from a SQL point of view I tend to
leave SCOM for the high level alerts; service and resource related errors,
those kinds of things.
SQL’s native alerting functionality
will give you a good practical layer of error reporting and if you don’t have
the budget for third party tools this can be a god send. Whether you should buy
third party tools is another question altogether. For busy multi-server estates where
there are lots of tier 1 instances then I would say they are pretty much essential.
Overall there are a number of ways we can handle errors
coming from SQL Server. They key to doing it successfully is making the most
out of what resource we have available and using it sensibly. A carefully
planned alerting setup with different operators/recipients is a very useful
approach so think about who needs to know about the different error types and tailor
your alerts accordingly. Configure alerts (from all sources) to not flood mailboxes and you'll have a seriously effective error handling process at your fingertips!
No comments:
Post a Comment