+subject: pm - process monitor

PM(1)                                                       PM(1)

       pm - process monitor

       pm [options] [cmd ...]

       pm  is  a simple but extentable process monitor written in
       akanga(1).  It should be run on a  regular  basis,  either
       from  cron(8) or it's own scheduler pmd(1).  Based on it's
       configuration pm will check if all configured services are
       running,  reporting  if one is not found.  If a process is
       found to be not running pm tries to restart the service if
       a  restart  program  is  given.  pm keeps track of what it
       already did and it will do it's operations (error  notifi-
       cation  and  restart attempt) only once.  So if a pm moni-
       tored service is down for some hours the notification will
       be  sent only once.  A status change of the monitored pro-
       cess (it's up again) clears this locks.

       While pm  itself  can  only  watch  processes  it  can  be
       extended through additional modules.  In this context pm's
       function is keeping track of which errors are  new  (book-
       keeping)  and  have to be reported (notifiying), if a pro-
       cess is partially (repairing) or completly out of  service
       (restarting)  and  also  the  bookkeeping  for restart and
       repair operations.  Bookkeeping and notifying can be  used
       by  other programs as well (see INTERFACE FUNCTIONS below)
       but for these functions pm does not have any understanding
       of the affected service.

       pm  records  errors  in  triples:  process,  component and
       object with an optional error message which  is  then  the
       fourth  field.   In  this  triple  object is meant to be a
       unique identifier for this type of error and component one
       of process's components.  If e.g. pm is configured to mon-
       itor that inetd has port 21  allocated  in  the  listening
       state  the  process would be `inetd', component `port' and
       the object `inetd.tcp.21'.   For  each  error  triples  pm
       records  the  combination of component and object and puts
       the affected service on it's internal error  list  (notice
       that  ok  triples are not recorded, they only remove error
       tags).  After pm has run all  it's  configured  modules  a
       service may be in one of three states:

              none of the checks returned an error.

       repair some  checks  returned an error while other did not
              (think of an inetd allocated port that  was  closed
              due to port strobing while inetd is still running).

              all checks returned error (inetd crashed).

       Depending on this status pm will either do nothing,  start
       the repair or restart program if one is configured for the
       affected process.

       pm can send notifications about what's  going  on  through
       several channels.

       syslog pm  writes  always message to the local syslog sys-
              tem.  This can not be  deconfigured.   Syslog  mes-
              sages have always the format

              pm: flag: message: object

              where  flag  is either `+OK' or `-ERR', message the
              optional diagnostic message and object  the  object
              from the monitor's status response.

       mail   If  the  mailer  and mailto options are set in pm's
              configuration it will call the  mailer  script  for
              sending e-mails.

       network messages
              If  the  sendmsg  and msghosts options are set pm's
              calls the sendmsg program to  send  it's  notifica-
              tions to the given msghosts.

       The  mailer  and sendmsg programs receive their parameters
       through environment variables.  These are

              the current date and time.

              the hosts hostname.

              the hosts first IP number.

              the list of e-mail recipients,  only  set  for  the
              mailer program.

              the pm generated message (one line).

              the  list  of configured msghosts, only set for the
              sendmsg program.

              the pm generated e-mail subject, only set  for  the
              mailer program.

       Notice  that  mailer and sendmsg have the intended uses as
       described above but that pm doesn't really care about what
       these programs really do.

       Every  status  change  will  be  reported only once.  pm's
       resubmit function resends active errors to syslog but does
       not trigger the other notifiers.

   Automatic Maintainance
       pm implements some automatic jobs.

       Nightly expiration
              Errors that were not reported to be active for more
              than 240 minutes  are  automatically  removed  from
              pm's  error  list.   Such events are logged through
              the configured channels.   This  automatic  cleanup
              job  is  run  automatically in the night but may be
              also triggered by a `pm cleanup'.

              Every night pm writes active error  to  syslog  and
              (if  configured)  calls  the  mailer  program which
              receives the list of active error on it's  standard

       Reboot detection
              If  pm  recognizes a system reboot (by uptime moni-
              toring) it will clear all  errors  and  delay  it's
              first execution for one invocation cycle to let the
              system startup completly.   This  event  is  logged
              through the configured channels.

       pm's  monitoring  can  be extended with additional modules
       that are automatically run by pm, see below for  the  con-
       figuration.   Such a module may receive it's configuration
       data on standard input if this is configured, otherwise it
       has  to  deal  on  it's  own with this.  pm expects status
       information from the modules in a defined  format  on  the
       module's standard output.  This format is

              this is the name of the affected service and a dash
              `-' if this status message is not  related  to  any
              service (e.g. hardisk full).

       object a  unique  identifier  for  the object to which the
              message is related.

       flag   the status flag, may be one of  `error',  `ok'  and
              `pending'  (or  `active').  The `pending' status is
              meant for situations where it's not clear  if  it's
              an  error or not, e.g. a harddisk is `ok' when it's
              filled to less than 50% of  it's  capacity  and  in
              `error'  if  it's  above 85%.  The interval between
              50% and 85% would then be `pending',  it's  not  an
              error  if  the allocated disk space is growing from
              something below 50% but keeps the error  active  if
              more than 85% were consumed before.

              an optional diagnostic message.

       The  monitor  program is expected to print status informa-
       tion about each of it's monitored object, otherwise active
       errors will expire.

       pm reads it's configuration from /etc/pm.conf.

       cycle cycle-time
              this  option is read by pmd and defines the time in
              seconds between two invocations of pm from pmd.

       mailer /path/to/mailer
              set the full path to the program that is  used  for
              e-mail notifications.

       mailto email-adr ...
              sets  the  email  addresses which should receive pm
              notifications by mail.

       module name config-prefix program
              defines an external module of name name.  The  mod-
              ule's  function  is  performed  by program from the
              directory /usr/local/sbin/pm.d, see below  for  the
              required  output format.  If config-prefix is not a
              dash `-' pm scan it's own configuration file  (usu-
              ally  /etc/pm.conf)  and  feeds all lines beginning
              with that prefix into program's standard input with
              config-prefix  removed from that line.  If a module
              uses configuration from a different file it's  sup-
              posed  that the file is under /etc/pm.d but this is
              up to the author's choice.

       msghosts host ...
              sets the list of hosts that should receive  notifi-
              cations by sendmsg.

       proc process-name ...
              tells pm which processes it should check.

       restart process /path/to/process-restarter
              sets  the program that is used to restart the given
              process if it's found to be completly down.

       repair process /path/to/process-repairer
              sets the program that is used to repair  the  given
              process  if  it's  found  to  be in an inconsistent

       runlevel runlevel-list
              tells pm in which runlevels it should run.

       sendmsg /path/to/sendmsg
              set the full path to the program that is  used  non
              e-mail notifications.

       startpmd /path/to/pmd-starter
              sets the program that is used to start pmd if pm is
              started to check for pmd and it's found to  be  not

       pm  allows comments (lines beginning with a `#') and blank
       lines in it's configuration file.

       If called without command line arguments pm performs  it's
       regular monitor function.  Otherwise it expects one of the
       following commands:

              removes all error, restart and  repair  mark  which
              are older than 240 minutes.

       clearerror service component object [msg]
              clears  the  error flag for the given component and
              object.  The service may be any value but should be
              a  dash `-' since pm may make use of this parameter
              in  a  future  release.   The  msg   parameter   is

       getval key
              returns  the  configuration value for key from it's
              configuration file.

       log text
              logs the given text and sends notifications if con-

       repeaterror service component object
              updates the timestamp of an error for component and
              object but will not create an error if  it  doesn't
              already exist.  See also clearerror.

       report prints all active errors.

              prints all active errors to stdout and syslog.

       seterror service component object [msg]
              sets  the  error  flags for the given component and
              object.  See also clearerror.

       pm supports the following options:

       -c     do a cron scheduled check if pmd is running.

       -d     print some diagnostic output to stderr  for  debug-

       -f configfile
              use configfile instead of /etc/pm.conf.


                           03 OCT 2003                      PM(1)