From SQLMail to SMTP in the SQL Agent

 

By Bill Wunder

 

In an earlier article, “Take my SQLMail, Please!”, I presented a case for an alternative for SQL mail. Then I described a stepwise strategy for moving away from xp_sendmail by first decoupling application stored procedures from the action of sending an email in the article “From xp_sendmail to safe_sendmail”  and then I suggested a way to replace xp_sendmail with an Simple Mail Transport Protocol (SMTP) relay based extended stored procedure in the backend of the now asynchronous call to send emails in the article “From xp_sendmail to xp_smtp_sendmail”. Now, to wrap up this discussion aimed at completely replacing SQLMail, I turn the attention toward the SQL Agent. Is easier than most would imagine to remove any SQL Agent dependencies on MAPI and the Outlook Client in SQL Server 2000. As an added bonus I’ll also show you a way to provide much better messages that those that SQLMail provides through the SQLAgent Notification option.

 

As I replace SQLMail, I will preserve the key benefits of SQLAgent Notifications. The features that the SQLAgent’s SQLMail provides that I want to continue to support include job failure email notification, job completion email notification, and SQL Agent Alerts email notification. On the other hand, in our shop anyway, SQLMail’s reliance on msdb.dbo.sysoperators to predetermine who can - and by their exclusion who cannot - receive an email from the SQLAgent is more trouble than it’s worth so we will not perpetuate this well hidden layer of maintenance and any associated pre-configuration requirement. We will provide an alternative to the Agent’s “Fail-Safe Operator” by supporting a default recipient for all safe_sendmail based emails. In reality, the default recipient is broader in scope than the “Fail-Safe Operator” because it applies to all emails, not just SQLAgent originated emails. Bu sure to look back at the “From xp_sendmail to xp_smtp_sendmail” article and the script for the SendAsyncEmail stored procedure to see how this mechanism works.

 

I will also add a handful of useful features to my SMTP based replacement that you don’t get with the out of the box SQLMail. First, all messages sent by the SQL Agent will include the actual error messages that caused an SQLAgent job to fail in the failure notification message rather than the generic failure message that SQLMail emits. Additionally, the new solution will provide an option to archive job failures to support the tracking of failures over time. Finally, because it uses the safe_sendmail T-SQL interface to send it’s email, the replacement will include an archive of all messages sent in a table within the SQL Server and is adaptable to any SQLMail alternative technology you might decide to use.

 

Here’s where I get to back off a little and explain that I think SQL Server is a great product even though it only talks MAPI  in the email community. If it weren’t for broad selection of tools available within SQL Server, it would not be possible to come up with something better than SQLMail.

 

Red Alert

One welcome tool is the fine assortment of tokens available in the SQLAgent. In case you aren’t familiar with the tokens, check out the Books on Line documentation for sp_add_jobstep. Even though the token documentation is unfortunately weak, one thing you’ll notice right away is that tokens and alerts were meant for each other. If you look over the script I'll use in the command of the only step in this job you'll notice that simply by the use of tokens, i can produce a message that will tells me everything about the alert that the under the covers Alert notification does.  The tokens are going to provide the error inf and even the database where the error occurred.

 


 

exec sysmon.dbo.safe_sendmail 
  @recipients = 'DBA@MyCompany.com'
  , @subject = 'SQL Server Alert System: alert condition occurred on [SRVR]'
  , @message = 'Error: [A-ERR]
       Severity: [A-SEV]
       Date: [STRTDT]
       Time: [STRTTM]
       Database: [A-DBN]
       Message: [A-MSG]  

       Check the [SRVR] SQL Server ErrorLog and the Application event log on the server for

       additional details'
 


 

This makes the task of capturing the characteristics of an alert very straight forward in a SQL Agent job step. With that in mind, I’m going to turn my alerts over to an agent job that I’ll name “w Alert Notification”.  This job is going be an easy interface to the Agent Alerts because it will take advantage of one of those underused abilities that most of us have looked at many times and managed to easily ignore. You can tell any alert to execute a job. Simply open an Alert from Enterprise Managers left pane, go to the Response Tab and select the “Execute job:” check box.  This will allow you can browse for any job that already exists on the server.

 

 

 

This means that in order to use the GUI it is necessary to create the job before you can call it from an Alert. It’s worth noting that once the job is created you can also add Alerts to the system by going to Job Schedule Tab of the job and selecting the “Add Schedule” button. This takes you to the same dialog as adding an alert directly from the Alerts folder of the Enterprise Manager left pane.

 

 

 

 

Naturally, I prefer a script to the GUI. If you have multiple servers to convert I'm betting you’ll understand why. You'd have to do a lot of clicking to move all the existing Alerts into the “w Alert Notification” job. Take a look at the script to add the “w Alert Notification Job” and then modify all existing Alerts to use the job. The script doesn’t include statements to remove any existing Operators from the jobs because it is necessary to review which Operators need to be notified of which Alerts and add this information to the script. I removed the Operators on my systems as I collected the information so I didn’t need to do it again in this script. (My fall back plan here was to use the scripts I had previously stored in SourceSafe using “Bill Wunder’s DDL Archive Utility”. Fortunately I didn’t need to go there either, but it's always best to have a fall back plan.)  If you find that you require more than one distribution list you may want to consider creating enough different notification jobs to cover all use cases. It’s even conceivable that you may want to continue to use Agent Operators in NET SEND mode in your alerts in which case you wouldn’t want to remove them though I find that NET SEND's value is dubious except in very rare cases. I would recommend that you always include the DBA (that’s you!) as a recipient for all Alerts. This will be your best indication of how much the system is actually getting used.

 

Getting the Message

OK, now we have alerts using the SMTP solution. All that’s left is to get the SQLAgent Job Notifications feeding the AsyncEmailQueue and the need to install Outlook on any SQL Server is Gone!

 

Since I’m not using Agent Operators, the built-in Notification ability won’t do. I’ll need to add a job step that will handle both job failure notification and completion with success notification to all jobs that require notification. Truth is if you want any kind of reliable failure notification you need to use the built-in Notification else a separate job step. If you rely on a mail to be sent from within a T-SQL job step or from within a procedure called from the job step and an error occurs that is “fatal to the batch” you’ll never know that a failure occurred. In an extension of that reality, if you rely on the built-in Notification you’ll stand a better chance of getting notified (if SQL Mail doesn’t let you down just when you needed it) but you’ll have to go the extra mile to actually find out why the job step failed. The easiest way to see the actual error is to view the Job History in Enterprise Manager, remembering of course to check the box that gives you all the details. Many times this method is not desirable. For one, you need to be a user that has the necessary permission to view the history. Always nice, for example, to get called at 2 AM by someone that wants to know why their Job failed so they can correct the problem and move on. Won’t it be so much nicer when they get he actual error in the email that tells them that the Job failed?

 

To get the error all you do is query msdb.dbo.sysjobhistory. Of course you have to join it to msdb.dbo.sysjobs by guid to find a row by job name: an excellent 3AM sobriety test. What I found to work better is to have a job step that all job steps are sent to on failure. In this job step, I query msdb.dbo.sysjobhistory and I insert the error message into a table in the admin database. That way, no matter how many jobs I have writing to msdb.dbo.sysjobhistory and irregardless of the history settings for the SQL Agent (and I always find the 1000 rows total, 100 rows per job default setting woefully inadequate) I can make sure I can go back to the actual error in the morning or on Monday or after I get back from Vacation. Even if the msdb.dbo.sysjobhistory has been overwritten. Even if a crafty developer has cleared the job history to cover their tracks- but that never really happens right? Have a look at the script to create the table and the two stored procedures necessary to handle all of your Job Notifications. These components can be used without implementing the SMTP mail facility to provide enhanced error reporting and error tracking even if you choose to stick it out with Outlook. The components are intended for use in the admin database so you can keep everyone from mucking around.

 

There are two procedures: log_job_failure is intended to replace the  built-in “When the job fails” mechanism and log_job_completion that replaces both the “When the job succeeds” and “When the job completes” mechanisms. Basically, log_job_failure will always cause the job to end in failure and always sends a message with “JOB FAILED” in the subject. log_job_completion can send either a “JOB SUCCEEDED” message and leave the job in a happy state or it can duplicate the log_job_failure behavior if the job is in a failed condition.

 

It was slightly tricky for everyone to get used to the job flow requirements of these jobs, but not a big deal for anyone. Here’s what the job step flow would look like for a failure only condition.

 

 

 

 

 

And this for an always notify condition. Everything is the same except the last step before the notification step continues into the notification step rather than “Quitting with success” as it does for failure only notification.

 

 

 

 

 

The job step for both is identical and very simple. Just provide the correct procedure name and the recipients that you want to be notified: admin.dbo.log_job_error if all you care about are failures and admin.dbo.log_job_completion if the notification step needs to send an email each time the job finishes. All the smarts is in the T-SQL so you can modify the logic without having to revisit every job if you need to tweak either procedure to suit your environment.

 

 

If you didn’t want to capture the error messages to a table it would be a simple matter to modify the procedures and omit the table. I have to warn you though, I’ve found that table useful on more than one occasion and having a history of failures over several weeks has also been repeatedly useful.

 

So there you have it. A complete replacement for xp_sendmail and SQLMail. We’ have explored a solution that is asynchronous, keeps a history of sent messages, maintains the kept history, uses a  standard user interface that mimics xp_sendmail, enhances the messages sent by the SQL Agent by including actual error messages rather than the generic failure message that the SQL Agent built-in notification facility puts out, provides full support for SQL Agent Alerts and is adaptable to any email technology you want to use. This solution makes it very easy to deploy and change the underlying email technology easily because the interface is centralizing in a single stored procedure: safe_sendmail.

 

Try it out. I’m sure there are lots of ways to make it better and many other ways to solve the general problem. Of one thing I am certain. My SQL Servers run better and longer between the reboots necessary to replenish depleted memory with this alternative than they ever did with SQLMail.

 

Bill