Monitoring batches in IT environment : efficiently using emails

Sometimes automatically running scripts in production environments are simply monitored by emails. Those scripts may run on schedule or be triggered by events and they send an email (for example to the technical support level 1) upon completion of the job. The content of the email will then give information about the outcome of the execution of the script.

This posts lists a few ideas that you might find useful to implement when in such an environment.

  1. Try to avoid emails when the job processed correctly : It is not useful to swamp the tech support with emails requiring no action. Of course you’ll have to be able to differentiate between a script which ran fine and sent no email, and one which didn’t send an email because it didn’t run at all 🙂
  2. Keep the subject of the email specific : It could be for example the name of the script and the status (OK or KO). This helps for faster (human or rule based) processing of the emails
  3. Clearly state the priority of the email in the subject. This helps to decide efficiently in which order to deal with problems when many are occurring simultaneously.
  4. Clearly mention which script produced the email : It IS annoying to have to find out first WHAT broke and ONLY THEN start analyzing and recovering …
  5. The body of the email should clearly state the error condition.
  6. When possible, the body of the email should give the recovery procedure. If this is not possible, it should point to the documentation

So let’s say those are my “best practices” based on field experience. Have more tips or rules ? Share them in the comments !

2 thoughts on “Monitoring batches in IT environment : efficiently using emails”

  1. hello, cronic is my friend and with it i solve almost all of the problems. when combined with nagios with script checking logfile that it was run (modification time) its almost bulletproof ;o)

Comments are closed.