+subject: pm - process monitor
PM(1) PM(1)
NAME
pm - process monitor
SYNOPSIS
pm [options] [cmd ...]
DESCRIPTION
pm is a simple but extentable process monitor written in
akanga(1). It should be run on a regular basis, either
from cron(8) or it's own scheduler pmd(1). Based on it's
configuration pm will check if all configured services are
running, reporting if one is not found. If a process is
found to be not running pm tries to restart the service if
a restart program is given. pm keeps track of what it
already did and it will do it's operations (error notifi-
cation and restart attempt) only once. So if a pm moni-
tored service is down for some hours the notification will
be sent only once. A status change of the monitored pro-
cess (it's up again) clears this locks.
While pm itself can only watch processes it can be
extended through additional modules. In this context pm's
function is keeping track of which errors are new (book-
keeping) and have to be reported (notifiying), if a pro-
cess is partially (repairing) or completly out of service
(restarting) and also the bookkeeping for restart and
repair operations. Bookkeeping and notifying can be used
by other programs as well (see INTERFACE FUNCTIONS below)
but for these functions pm does not have any understanding
of the affected service.
pm records errors in triples: process, component and
object with an optional error message which is then the
fourth field. In this triple object is meant to be a
unique identifier for this type of error and component one
of process's components. If e.g. pm is configured to mon-
itor that inetd has port 21 allocated in the listening
state the process would be `inetd', component `port' and
the object `inetd.tcp.21'. For each error triples pm
records the combination of component and object and puts
the affected service on it's internal error list (notice
that ok triples are not recorded, they only remove error
tags). After pm has run all it's configured modules a
service may be in one of three states:
operational
none of the checks returned an error.
repair some checks returned an error while other did not
(think of an inetd allocated port that was closed
due to port strobing while inetd is still running).
restart
all checks returned error (inetd crashed).
Depending on this status pm will either do nothing, start
the repair or restart program if one is configured for the
affected process.
Logging
pm can send notifications about what's going on through
several channels.
syslog pm writes always message to the local syslog sys-
tem. This can not be deconfigured. Syslog mes-
sages have always the format
pm: flag: message: object
where flag is either `+OK' or `-ERR', message the
optional diagnostic message and object the object
from the monitor's status response.
mail If the mailer and mailto options are set in pm's
configuration it will call the mailer script for
sending e-mails.
network messages
If the sendmsg and msghosts options are set pm's
calls the sendmsg program to send it's notifica-
tions to the given msghosts.
The mailer and sendmsg programs receive their parameters
through environment variables. These are
MSG_DATE
the current date and time.
MSG_HOSTNAME
the hosts hostname.
MSG_IPNUM
the hosts first IP number.
MSG_MAILTO
the list of e-mail recipients, only set for the
mailer program.
MSG_MESSAGE
the pm generated message (one line).
MSG_MSGHOSTS
the list of configured msghosts, only set for the
sendmsg program.
MSG_SUBJECT
the pm generated e-mail subject, only set for the
mailer program.
Notice that mailer and sendmsg have the intended uses as
described above but that pm doesn't really care about what
these programs really do.
Every status change will be reported only once. pm's
resubmit function resends active errors to syslog but does
not trigger the other notifiers.
Automatic Maintainance
pm implements some automatic jobs.
Nightly expiration
Errors that were not reported to be active for more
than 240 minutes are automatically removed from
pm's error list. Such events are logged through
the configured channels. This automatic cleanup
job is run automatically in the night but may be
also triggered by a `pm cleanup'.
Reminders
Every night pm writes active error to syslog and
(if configured) calls the mailer program which
receives the list of active error on it's standard
input.
Reboot detection
If pm recognizes a system reboot (by uptime moni-
toring) it will clear all errors and delay it's
first execution for one invocation cycle to let the
system startup completly. This event is logged
through the configured channels.
Modules
pm's monitoring can be extended with additional modules
that are automatically run by pm, see below for the con-
figuration. Such a module may receive it's configuration
data on standard input if this is configured, otherwise it
has to deal on it's own with this. pm expects status
information from the modules in a defined format on the
module's standard output. This format is
process
this is the name of the affected service and a dash
`-' if this status message is not related to any
service (e.g. hardisk full).
object a unique identifier for the object to which the
message is related.
flag the status flag, may be one of `error', `ok' and
`pending' (or `active'). The `pending' status is
meant for situations where it's not clear if it's
an error or not, e.g. a harddisk is `ok' when it's
filled to less than 50% of it's capacity and in
`error' if it's above 85%. The interval between
50% and 85% would then be `pending', it's not an
error if the allocated disk space is growing from
something below 50% but keeps the error active if
more than 85% were consumed before.
message
an optional diagnostic message.
The monitor program is expected to print status informa-
tion about each of it's monitored object, otherwise active
errors will expire.
CONFIGURATION FILE
pm reads it's configuration from /etc/pm.conf.
cycle cycle-time
this option is read by pmd and defines the time in
seconds between two invocations of pm from pmd.
mailer /path/to/mailer
set the full path to the program that is used for
e-mail notifications.
mailto email-adr ...
sets the email addresses which should receive pm
notifications by mail.
module name config-prefix program
defines an external module of name name. The mod-
ule's function is performed by program from the
directory /usr/local/sbin/pm.d, see below for the
required output format. If config-prefix is not a
dash `-' pm scan it's own configuration file (usu-
ally /etc/pm.conf) and feeds all lines beginning
with that prefix into program's standard input with
config-prefix removed from that line. If a module
uses configuration from a different file it's sup-
posed that the file is under /etc/pm.d but this is
up to the author's choice.
msghosts host ...
sets the list of hosts that should receive notifi-
cations by sendmsg.
proc process-name ...
tells pm which processes it should check.
restart process /path/to/process-restarter
sets the program that is used to restart the given
process if it's found to be completly down.
repair process /path/to/process-repairer
sets the program that is used to repair the given
process if it's found to be in an inconsistent
state.
runlevel runlevel-list
tells pm in which runlevels it should run.
sendmsg /path/to/sendmsg
set the full path to the program that is used non
e-mail notifications.
startpmd /path/to/pmd-starter
sets the program that is used to start pmd if pm is
started to check for pmd and it's found to be not
running.
pm allows comments (lines beginning with a `#') and blank
lines in it's configuration file.
INTERFACE FUNCTIONS
If called without command line arguments pm performs it's
regular monitor function. Otherwise it expects one of the
following commands:
cleanup
removes all error, restart and repair mark which
are older than 240 minutes.
clearerror service component object [msg]
clears the error flag for the given component and
object. The service may be any value but should be
a dash `-' since pm may make use of this parameter
in a future release. The msg parameter is
optional.
getval key
returns the configuration value for key from it's
configuration file.
log text
logs the given text and sends notifications if con-
figured.
repeaterror service component object
updates the timestamp of an error for component and
object but will not create an error if it doesn't
already exist. See also clearerror.
report prints all active errors.
resubmit
prints all active errors to stdout and syslog.
seterror service component object [msg]
sets the error flags for the given component and
object. See also clearerror.
OPTIONS
pm supports the following options:
-c do a cron scheduled check if pmd is running.
-d print some diagnostic output to stderr for debug-
ging.
-f configfile
use configfile instead of /etc/pm.conf.
SEE ALSO
pmd(1).
03 OCT 2003 PM(1)