One of the common problems in Unix administration is the undetected crash of some system daemon in your mail or web processing chain. On our system, CLAMAV is usually a good candidate for startup problems after an automated virus database refresh.
Lately, after a crash of the clamav-milter daemon, Postfix rejected all mails with a temporary failure, since this is the default behavior for non-working milter plugins.
We looked for a monitoring solution, and beside Mon with it’s tricky configuration syntax, I found monit in Debian. There are several good configuration examples, and I was able to copy and paste a config file for the most important daemons only in a minute:
[source:C]
set daemon 60
set logfile syslog facility log_daemon
set mailserver some.other.mailhost,
localhost
set eventqueue
basedir /var/monit
slots 100
set mail-format { from: root@monitored.host }
set alert peter@some.other.mailhost
check system monitored.host
if loadavg (1min) > 4 then alert
if loadavg (5min) > 2 then alert
if memory usage > 75% then alert
if cpu usage (user) > 70% then alert
if cpu usage (system) > 30% then alert
if cpu usage (wait) > 20% then alert
check process postfix with pidfile /var/spool/postfix/pid/master.pid
group mail
start program = “/etc/init.d/postfix start”
stop program = “/etc/init.d/postfix stop”
if failed port 25 protocol smtp then restart
if 5 restarts within 5 cycles then timeout
check process courier-imap with pidfile /var/run/courier/imapd.pid
start program = “/etc/init.d/courier-imap start”
stop program = “/etc/init.d/courier-imap stop”
if 5 restarts within 5 cycles then timeout
if failed port 143 type TCP protocol IMAP then restart
check process courier-imap-ssl with pidfile /var/run/courier/imapd-ssl.pid
start program = “/etc/init.d/courier-imap-ssl start”
stop program = “/etc/init.d/courier-imap-ssl stop”
if 5 restarts within 5 cycles then timeout
if failed port 993 type TCPSSL protocol IMAP then restart
check process courier-pop with pidfile /var/run/courier/pop3d.pid
start program = “/etc/init.d/courier-pop start”
stop program = “/etc/init.d/courier-pop stop”
if 5 restarts within 5 cycles then timeout
if failed port 110 type TCP protocol POP then restart
check process courier-pop-ssl with pidfile /var/run/courier/pop3d-ssl.pid
start program = “/etc/init.d/courier-pop-ssl start”
stop program = “/etc/init.d/courier-pop-ssl stop”
if 5 restarts within 5 cycles then timeout
if failed port 995 type TCPSSL protocol POP then restart
check process spamd with pidfile /var/run/spamd.pid
start program = “/etc/init.d/spamassassin start”
stop program = “/etc/init.d/spamassassin stop”
[/source]
But for the crashed clamav-milter, it was not possible to add such a statement ;-( The reason ?
Monitneeds in any case a valid PID file. Both freshclam and clamav-milter on my system create such a file, but use /lib/lsb/init-functions for a daemon start procedure (“start_daemon”), instead of using the normal start-stop-daemon tool in Debian. The LSB library leads to a PID file with a dash (“-4711″) before the PID number in it, which is not understood by Monit. So far, I am not sure if the init-functions library has a bug, or if Monit should be able to handle such a PID file …