Just to set the scene; we've received alerts that a SQL instance is consistently above its CPU threshold of 85%; its currently 97%. High CPU on a SQL instance would almost always make me jump straight into the transaction layer but this particular box was a barely used non-production instance, so it was rather unlikely that a rogue query was bringing a server to its knees.
So before diving head first into SQL we checked task manager to see what was actually running up resource on the server, AV perhaps!?
What we found was an executable called DatabaseMail.exe consuming roughly 90% of the CPU. Database mail is a separate exe to SQL Server, though does run with the same service account; a bit more info here: https://msdn.microsoft.com/en-us/library/ms190466.aspx
So what is firing Database Mail on a system that isn't in use?
Checking the SQL error log soon revealed the culprit. We could see failed logins galore originating from an external server monitoring tool. This in turn was firing error 14 alerts that had been set up within SQL Agent (brilliant!) but the response was been emailed to an address that wasn't been monitored (boo!).
Alerts are fantastic, but they're only as good as the action taken upon them. As they weren't being picked up the errors continued every few seconds and each time database mail would kick out another email thus consuming CPU; this soon became a bottleneck and the server wide alerts went into action.
The login was created and assigned the necessary permissions. The errors ceased and CPU settled down after a few minutes, that would be database mail working through the remainder its queues.
Unexpected? Yes, many of the CPU issues that I have dealt with will have a rogue query involved somewhere down the line but as this incident proves; it won't be the case all of the time. By expecting the unexpected we assess problems carefully without diving in head first and put the necessary pieces together to find both a root cause which ultimately leads to a solution.
No comments:
Post a Comment