Category: HPOM

UXMON: volumegroup – Only one path detected, no path redundancy

Also see:
UXMON: mpathb – Only one path detected, no path redundancy
UXMON: SY1_log2_disk_001 – Only one path detected, no path redundancy

Node : linux.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : major
OM Server Time: 2015-10-26 08:13:59
Message : UXMON: volumegroup – Only one path detected, no path redundancy
Msg Group : OS
Application : mpmon
Object : mp
Event Type :
not_found

Instance Name :
not_found

Instruction : The multipathd -k”show map $device topology” command shows more details

Please check /var/opt/OV/log/OpC/mp_mon.log for more details

This is a virtual server

root@linux:~ # lspci
00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX – 82443BX/ZX/DX Host bridge (rev 01)
00:01.0 PCI bridge: Intel Corporation 440BX/ZX/DX – 82443BX/ZX/DX AGP bridge (rev 01)
00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 08)
00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
00:07.7 System peripheral: VMware Virtual Machine Communication Interface (rev 10)
00:0f.0 VGA compatible controller: VMware SVGA II Adapter
00:11.0 PCI bridge: VMware PCI bridge (rev 02)
00:15.0 PCI bridge: VMware PCI Express Root Port (rev 01)
00:15.1 PCI bridge: VMware PCI Express Root Port (rev 01)
00:15.2 PCI bridge: VMware PCI Express Root Port (rev 01)
00:15.3 PCI bridge: VMware PCI Express Root Port (rev 01)
00:15.4 PCI bridge: VMware PCI Express Root Port (rev 01)
00:15.5 PCI bridge: VMware PCI Express Root Port (rev 01)
00:15.6 PCI bridge: VMware PCI Express Root Port (rev 01)
00:15.7 PCI bridge: VMware PCI Express Root Port (rev 01)
00:16.0 PCI bridge: VMware PCI Express Root Port (rev 01)
00:16.1 PCI bridge: VMware PCI Express Root Port (rev 01)
00:16.2 PCI bridge: VMware PCI Express Root Port (rev 01)
00:16.3 PCI bridge: VMware PCI Express Root Port (rev 01)
00:16.4 PCI bridge: VMware PCI Express Root Port (rev 01)
00:16.5 PCI bridge: VMware PCI Express Root Port (rev 01)
00:16.6 PCI bridge: VMware PCI Express Root Port (rev 01)
00:16.7 PCI bridge: VMware PCI Express Root Port (rev 01)
00:17.0 PCI bridge: VMware PCI Express Root Port (rev 01)
00:17.1 PCI bridge: VMware PCI Express Root Port (rev 01)
00:17.2 PCI bridge: VMware PCI Express Root Port (rev 01)
00:17.3 PCI bridge: VMware PCI Express Root Port (rev 01)
00:17.4 PCI bridge: VMware PCI Express Root Port (rev 01)
00:17.5 PCI bridge: VMware PCI Express Root Port (rev 01)
00:17.6 PCI bridge: VMware PCI Express Root Port (rev 01)
00:17.7 PCI bridge: VMware PCI Express Root Port (rev 01)
00:18.0 PCI bridge: VMware PCI Express Root Port (rev 01)
00:18.1 PCI bridge: VMware PCI Express Root Port (rev 01)
00:18.2 PCI bridge: VMware PCI Express Root Port (rev 01)
00:18.3 PCI bridge: VMware PCI Express Root Port (rev 01)
00:18.4 PCI bridge: VMware PCI Express Root Port (rev 01)
00:18.5 PCI bridge: VMware PCI Express Root Port (rev 01)
00:18.6 PCI bridge: VMware PCI Express Root Port (rev 01)
00:18.7 PCI bridge: VMware PCI Express Root Port (rev 01)
03:00.0 Serial Attached SCSI controller: VMware PVSCSI SCSI Controller (rev 02)
0b:00.0 Ethernet controller: VMware VMXNET3 Ethernet Controller (rev 01)
13:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01)
1b:00.0 Ethernet controller: VMware VMXNET3 Ethernet Controller (rev 01)

I have disabled the module mpmon

root@linux:~ # vi /var/opt/OV/conf/OpC/mp_mon.cfg
disable = yes

root@linux:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -d mpmon
>>Debug mode activated
>>Opened the logfile: /var/opt/OV/log/OpC/mp_mon.log
>>multipathd is running with just one daemon
>>UXMONmdmon::PARSER_CFG start parsing the md_mon.cfg…….
>>Exit because of disable setting

HPOM threshold configuration files

Here is the files that you may want to copy to /var/opt/OV/conf/OpC and then customize

root@linux:/var/opt/OV/bin/instrumentation # ls -l *.cfg
-rwxr-xr-x. 1 root root 10305 Oct 21 09:05 act_mon.cfg
-rwxr-xr-x. 1 root root 1987 Oct 21 09:05 bond_mon.cfg
-rwxr-xr-x. 1 root root 4330 Oct 21 09:05 boot_mon.cfg
-rwxr-xr-x. 1 root root 3809 Oct 21 09:05 cron_mon.cfg
-rwxr-xr-x. 1 root root 9981 Oct 21 09:05 df_mon.cfg
-rwxr-xr-x. 1 root root 1484 Oct 21 09:05 dmsg_mon.cfg
-rwxr-xr-x. 1 root root 1324 Oct 21 09:05 hw_mon.cfg
-rwxr-xr-x. 1 root root 1493 Oct 21 09:05 kts_mon.cfg
-rwxr-xr-x. 1 root root 1404 Oct 21 09:05 loop_mon.cfg
-rwxr-xr-x. 1 root root 2017 Oct 21 09:05 lp_mon.cfg
-rwxr-xr-x. 1 root root 2150 Oct 21 09:05 md_mon.cfg
-rwxr-xr-x. 1 root root 2567 Oct 21 09:05 mp_mon.cfg
-rwxr-xr-x. 1 root root 8119 Oct 21 09:05 nfs_mon.cfg
-rwxr-xr-x. 1 root root 1607 Oct 21 09:05 nic_mon.cfg
-rwxr-xr-x. 1 root root 1786 Oct 21 09:05 ntp_mon.cfg
-rwxr-xr-x. 1 root root 10023 Oct 21 09:05 ps_mon.cfg
-rwxr-xr-x. 1 root root 4878 Oct 21 09:05 rc_mon.cfg
-rwxr-xr-x. 1 root root 1832 Oct 21 09:05 sc_mon.cfg
-rwxr-xr-x. 1 root root 3778 Oct 21 09:05 scsi_mon.cfg
-rwxr-xr-x. 1 root root 2919 Oct 21 09:05 sg_mon.cfg
-rwxr-xr-x. 1 root root 1956 Oct 21 09:05 sshd_mon.cfg
-rwxr-xr-x. 1 root root 1831 Oct 21 09:05 svcs_mon.cfg
-rwxr-xr-x. 1 root root 2860 Oct 21 09:05 swap_mon.cfg
-rwxr-xr-x. 1 root root 7925 Oct 21 09:05 UXMONbroker.cfg
-rwxr-xr-x. 1 root root 3947 Oct 21 09:05 UXMONmetrics.cfg
-rwxr-xr-x. 1 root root 3246 Oct 21 09:05 UXMONperf.cfg
-rwxr-xr-x. 1 root root 2966 Oct 21 09:05 uxmon_selfcheck.cfg
-rwxr-xr-x. 1 root root 2280 Oct 21 09:05 uxmonsyslog.cfg
-rwxr-xr-x. 1 root root 2273 Oct 21 09:05 vc_mon.cfg
-rwxr-xr-x. 1 root root 5479 Oct 21 09:05 vol_mon.cfg

act_mon.cfg

###############################################################################
# GD UX MON #
#@(#) $Id: act_mon.cfg 2180 2015-05-11 09:11:18Z baoliz $
#@(#) $Rev: 2180 $
#@(#) $Author: baoliz $
#@(#) $Date: 2015-05-11 17:11:18 +0800 (Mon, 11 May 2015) $
#@(#) $LastChangedBy: baoliz $
###############################################################################

#############################################################################
# File: act_mon.cfg
# Description: The File Activity Monitor Configuration file
# Package : GD UXMON (AROA PROJECT)
#############################################################################

################################################################################
#
# The intention of this script is to monitor the last modification time of a
# file, or to monitor its size. This is used to supervise other programs or
# scripts which have to write regularly to their logfile. If a program or a
# script doesn’t modify “its” file, there is probably something wrong with this process.
#
# If the configured interval is exceeded for the file which is intended to be
# monitored, or if the size is above or below the configured limit (depending
# on whether the size threshold has a modifier),
# a log-message is written
#
#

################################################################################
#
# Syntax:
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
# Filename threshold|MISSING severity schedule group
# ===========================================================================
# /dir/file1 15m WARNING 0000-2400 * MYGROUP
# /dir/file2 50s CRITICAL 0000-2400 * I
# /dir/file3 >100KB WARNING 0000-2400 * OTHERGROUP
# /dir 15n WARNING 0000-2400 *
# /dir/file MISSING
#
# Note wildcards are supported as well
# /dir/file1*4 5n WARNING
# /d*r/file1*4 15m WARNING
# /dir/prod/file1*4 >100KB WARNING
#
# rearm
#==============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# DEFAULTS
#===============
# If Severity not specified is assumed warning.
# If threshold is not qualified is assumed s (seconds)
# If GROUP not specified is assumed the group NONE
# If no qualifier is explicited (> or ” is assumed
#
# THRESHOLD|MISSING
#================
#
# The INTERVAL can be given in different units.Possible units for file age are:
# (no unit) => seconds
# s => seconds
# m => minutes
# h => hours
# d => days
# Possible units for file size are:
# KB => kilobytes
# MB => megabytes
# GB => gigabytes
# File count is configured with a small n
# n => File count
#
# In addition, file sizes may be preceded by the optional “>” or “” (if the file is larger than the threshold,
# a message will be sent).
#
# The file count will count the number of files in a directory(s). Note it
# will exclude directories within directories
#
# There is a special entry that can be used to monitor a directory based on a
# date. For example an application might output to a directory for a particular
# day and you need to check the number of files or that kind. By using
# the template (%FORMAT) the current date will be substituted e.g.
# /tmp/(%DDMMYYYY) monitors /tmp/26032006 . The date format depends on
# the template. The template can be formed with Y, M, D (year month day) and h m s
# hour, minute, second) (please, note M vs m). If you want to specify a time format
# with year of 2 digits, use YY , with 4 digits, YYYY.
# For instance, DDMMYYhhmmss is a full date timestamp like 230306152558
#
# MISSING
#===================
# The MISSING option is used to check if the file exists.
# If the argument MISSING is configured, and the file does not exist, it will trigger an alarm to OVO.
# If the MISSING doesn’t cofigure,no alarm triggered to OVO, only report in the LOGFILE.
#
#
# SEVERITY
#===================
# The SEVERITY can be one of the following:
# WARNING, MINOR, MAJOR, CRITICAL (case insensitive)
# If not explicited, it will be assumed warning.
# Please, be considerd with the severities, think twice prior to use CRITICAL, the most severe.
#
# SCHEDULE
#=====================
# You can define the time frame for each line/configuration giving an specific Schedule.
# This means that one file or configuration it will be only considered when the current time falls
# within the scheduled time configured
# The format is – , some examples are:
# 0000-2400 * The basic configuration means in any day in any moment
# 0900-1800 1-5 Means to check from 09:00 am to 18:00 pm from Monday to Friday.
# 0000-2400 6,0 Means at any hour only Saturday and Sunday
#
# ACTIONS
#================
# You can declare an ACTION, that will be triggered if the threshold is
# exceeded. The ACTION follow the syntax *ACTION
# Examples of allowed syntaxis:
# *ACTION rm -f /tmp/*.gz
#
# *ACTION gzip !FSNAME/*.log ; rm !FSNAME/*.txt
#
# Therefore, the ACTION can be in a different line or just following the
# filesystem declaration. (all in one line)
# Besides, you have some variables that will be replaced prior to trigger
# the execution:
# !GROUP (the group used)
# !MESSAGE (the same message that will be write in the logfile)
# !SEVERITY (The severity declared in this alarm)
#
#
#
# PACKAGE. Cluster Awareness
#=============================
# There is another directive , *PACKAGE pkgname, that can be used to inform to ACTMON that
# this resource is managed by a Cluster, and more specifically included in the package named
# pkgname
# The ACTMON , prior to test that file system, it will check if the cluster package is in fact
# running in this local node, if yes, it will process it as any other file system. If it’s not
# running in this node (therefore is running in another one), it will ignore this file system
#
# IPISLOCAL. IP address or FQHN
#=============================
# Check if IP address or FQHN is assigned to monitored node and if a corresponding interface is UP
#
# I
#=============================
# I is configured at the line end that means ignore the alarm even the file attribute breaches the threshold
#
# GROUPING
#==============================
# You can group the alarms, giving them the same GROUP. In that case
# only will be reported the MOST SEVERE ALARM and others will be masked by this one
# If you don’t set a group, is considered then the group NONE.
# Even although the message be masked by another more critical message
# in the same group, its action IT WILL BE EXECUTED. And, proper log message
# will be write.
#
#
# AUTORECOVERY. (TOTAL AUTORECOVERY)
#=====================================
# Each time an alarm happens, an action (if defined) is triggered, then a second
# check will be performed, so this monitoring is based on two phases:
# first, checks the alarms, and triggers the actions if any.
# second, evaluates the alarms again, and, only those who persist will be logged
# In that way, ACTION can help to automatize the maintenance and decrease the number of alarms.
#
#
# CMA SUPPORT
#========================================
# You can access to the EventType and EventTypeInstance using teh [ ] brackets statements
# to set which values are to be used by the following lines
# The syntax of such line is : [ OBJECT , EventType, EventTypeInstance ]
# And the Default values for all cases are NONE
# You can state only the object : [ OBJECT ]
# You can state object and EventType leave EventTypeInstace to default value [OBJECT ,EventType ]
# But you can never use more than three fields or less than 1
# Fields to be separated by commands and blank spaces not allowed
# [ OS, OS, SapApp ]
# process 1- warning
#
# NOTE: When a line has set a GROUP or OBJECT and this conflicts with the [ ] statement, the line field will be used
#
#
#
################################################################################
#
# Some examples more
#
# disable this module
# disable = yes

# this module allow to run after every 10 minutes
# interval = 10

#REARM = true

#/var/opt/OV/log/OpC/cl_mon.log 1h CRITICAL 0000-2400 *
#/var/opt/osit/log 10n
#/var/adm/syslog/syslog.log 3d warning 0800-1700 1-5 OPS
#/tmp/telelezing/(%DDMMYY) <2n warning MYGROUP
#/tmp/core MISSING
#
#############################################
# Make your CONFIGURATION FROM HERE #
#############################################

bond_mon.cfg

#############################################################################
#@ $Id: bond_mon.cfg 2132 2014-08-22 06:47:32Z zhaofeif $
#@ $Rev: 2197 $
#@ $Author: baoliz $
#@ $Date: 2015-05-17 18:16:07 +0800 (Sun, 17 May 2015) $
#@ $LastChangedBy: baoliz $
##############################################################################
#
# Syntax:
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
# exclude
#
# REARM
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# exclude
#========================
# The “exclude ” syntax is used to force entries of the device name to be ignored.
#

############################################################################
# For example:

# disable this module
# disable = yes

# the mdmon will execute every 5 minutes.
# interval = 5

# exclude some device
# exclude bond0
# exclude bond1

#############################################################################
# end of bond_mon.cfg
############################################################################

boot_mon.cfg

###############################################################################
#@(#) $Id: boot_mon.cfg 2149 2015-03-03 08:45:34Z zhaofeif $
#@(#) $Rev: 2149 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $ #
# =========================================================================== #
# Copyright (c) 2006 Hewlett Packard – All Rights Reserved. #
# Author: Cesar Lombao Vazquez #
###############################################################################

#############################################################################
# File: boot_mon.cfg
# Description: The File Activity Monitor Configuration file
# Package : GD UXMON (AROA PROJECT)
#############################################################################

################################################################################
#
# The intention of this script is to monitor whether the system has rebooted within 1 day.
#
#
################################################################################
#
# Syntax:
#
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]

# severity=xxx
# group=yyy
# typeinstance=ccccc
# eventtypeinstance=ttttt
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
#===============
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# WARNING MYGROUP
#
# DEFAULTS
#===============
# If Severity not specified is assumed warning.
# If GROUP not specified is assumed the group NONE
#
#
# GROUP
# ===================
# With GROUP it can be influenced in the alarm routing and ticketing interface, knowing that:
# IF the GROUP starts with B_ the alarm will reamin in the browser ONLY
# IF the GROUP starts with TT_ the alarm will be send to Ticketing systems (THIS IS THE DEFAULT BEHAVIOUR)
# IF the GROUP starts with N_ the alarm will be send to the Notification system
# IF the GROUP starts with TN_ the alarm will be sent both to Notif and TT
# IF the GROUP does not start with any of the above alarm will be sent to TroubleTicket ssystem (T_ )
#
# SEVERITY
#===================
# The SEVERITY can be one of the following:
# WARNING, MINOR, MAJOR, CRITICAL (case insensitive)
# If not explicited, it will be assumed warning.
# Please, be considerd with the severities, think twice prior to use CRITICAL, the most severe.
#
# CMA SUPPORT
#========================================
# You can access to the EventType and EventTypeInstance
#
#############################################################################
#
#
################################################################################
#
# Some examples more
#
# severity=warning
# group=MYGROUP
#
#############################################
# Make your CONFIGURATION FROM HERE #
#############################################
#REARM = true

disable = no

severity = major
group = BOOT

eventtype = NONE
eventtypeinstance = NONE

cron_mon.cfg

###############################################################################
#@(#) $Id: cron_mon.cfg 2149 2015-03-03 08:45:34Z zhaofeif $
#@(#) $Rev: 2149 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: cron_mon.cfg
# Description: The cron Monitor Configuration file
# Package : GD UXMON (AROA PROJECT)
#############################################################################

#############################################################################
#
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
#
#
#
#
# Commnand line triggered by the cron
# = warning|minor|major|critical (case insensitive)
# = hhmm-hhmm []
# = n[,] | * (is also posible a range like 1-5)
# where n represents day of a week starting with
# Sunday=0 and Saturday=6;
# * means all days
# (if you use 1,* or 3,*,4 alike is not allowed)
#
#
#
#
# DEFAULTS
#===============
# If Severity not specified is assumed warning.
# If GROUP not specified is assumed the group NONE
# If schedule not specified is assumed every day , at any time
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
#
#
# GROUPING
#==============================
# You can group the alarms, giving them the same GROUP. In that case
# only will be reported the MOST SEVERE ALARM and others will be masked by this one
# If you don’t set a group, is considered then the group NONE.
# Even although the message be masked by another more critical message
# in the same group, its action IT WILL BE EXECUTED. And, proper log message
# will be write.
#
#
# AUTORECOVERY. TOTAL
#================================
# Due the nature of the cron, there is no autorecovery, therefore
# each time an alarm be ready to be triggered it will be.
#
#
# CMA SUPPORT
#========================================
# You can access to the EventType and EventTypeInstance using teh [ ] brackets statements
# to set which values are to be used by the following lines
# The syntax of such line is : [ OBJECT , EventType, EventTypeInstance ]
# And the Default values for all cases are NONE
# You can state only the object : [ OBJECT ]
# You can state object and EventType leave EventTypeInstace to default value [OBJECT ,EventType ]
# But you can never use more than three fields or less than 1
# Fields to be separated by commands and blank spaces not allowed
# [ OS, OS, CRON ]
#
# NOTE: When a line has set a GROUP or OBJECT and this conflicts with the [ ] statement, the line field will be used
#
#############################################################################
#################
### EXAMPLES #
#################
#REARM = true
#
#/usr/local/bin/myexecution warning 0000-2400 * MYGROUP
#/opt/tool/bin/mytool MYTOOL

#####################
# End of examples #
#####################

###############################
# Start your config from here #
###############################

#############################################################################
# end of cron_mon.cfg
#############################################################################

df_mon.cfg

###############################################################################
#@(#) $Id: df_mon.cfg 2201 2015-05-19 08:07:49Z baoliz $
#@(#) $Rev: 2201 $
#@(#) $Author: baoliz $
#@(#) $Date: 2015-05-19 16:07:49 +0800 (Tue, 19 May 2015) $
#@(#) $LastChangedBy: baoliz $
###############################################################################

#############################################################################
# File: df_mon.cfg
# Description: The Diskspace Monitor Configuration file
# Package : GD UXMON (AROA PROJECT)
#############################################################################

#############################################################################
#
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
# exclude
# [] [] [] [ ; GROUP ]
# [*ACTION action] [*PACKAGE pkgname] [*IPISLOCAL ] [*DURATION ]
#
# space utilization threshold
# inode utilization threshold
# = normal|warning|minor|major|critical (case insensitive)
# = hhmm-hhmm []
# = n[,] | * (is also posible a range like 1-5)
# where n represents day of a week starting with
# Sunday=0 and Saturday=6;
# * means all days
# (if you use 1,* or 3,*,4 alike is not allowed)
#
# Use ‘-‘ to skip threshold parameter specification.
# If parameter is not specified the checking of that value is skipped.
#
# Examples allowed:
# * 95 90 warning
# /opt/SAP* 95 warning
# /tmp – 90% ; MYGROUP
# /home/userx 50Mb – 0000-2400 0-6 ; USERS
# /opt 70 – major 0600-1800 1,2,3,4,5
# /var 75% – critical 0600-2200 *
# /var/opt/OV 5Gb ; OVO (here the severity is taken by default-> warning)
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# DEFAULTS
#===============
# If Severity not specified is assumed warning.
# If threshold is not qualified with % or Gb , or Mb, is assumed %
# If GROUP not specified is assumed the group NONE
#
# THRESHOLD
#================
# The Tresholds can be set specifying the percentage used (the default)
# or absolute values in Mb or Gb. Thresholds valid (only for disk space) are
# 75Mb 34Gb 34 10%
# In the case of 34 is assumed you want to mean 34%
#
# ACTIONS
#================
# You can declare an ACTION, that will be triggered if the threshold is
# exceeded. The ACTION follow the syntax *ACTION
# Examples of allowed syntaxis:
# /tmp 1Mb – warning ; OS *ACTION rm -f /tmp/*.gz
#
# /home/userx 1Mb – warning
# *ACTION gzip !FSNAME/*.log ; rm !FSNAME/*.txt
#
# Therefore, the ACTION can be in a different line or just following the
# filesystem declaration. (all in one line)
# Besides, you have some variables that will be replaced prior to trigger
# the execution:
# !FSNAME (filesystem name)
# !GROUP (the group used)
# !MESSAGE (the same message that will be write in the logfile)
# !REASON (SPACE or INODE, depending the reason that triggered the alarm)
# !THRESHOLD (The Threshold that has been exceeded)
# !CURRENT (The current value compared with the threshold)
# !SEVERITY (The severity declared in this alarm)
#
#
# PACKAGE. Cluster Awareness
#=============================
# There is another directive , *PACKAGE pkgname, that can be used to inform to DFMON that
# this resource is managed by a Cluster, and more specifically included in the package named
# pkgname
# The DFMON , prior to test that file system, it will check if the cluster package is in fact
# running in this local node, if yes, it will process it as any other file system. If it’s not
# running in this node (therefore is running in another one), it will ignore this file system
#
# IPISLOCAL. IP address or FQHN
#=============================
# Check if IP address or FQHN is assigned to monitored node and if a corresponding interface is UP

# DURATION
#=============================
# Ticket is triggered after duration minutes and ticket will be held within DURATION minutes, if DURATION is configured. .

# GROUPING
#==============================
# You can group the alarms, giving them the same GROUP. In that case
# only will be reported the MOST SEVERE ALARM and others will be masked by this one
# If you don’t set a group, is considered then the group NONE.
# Even although the message be masked by another more critical message
# in the same group, its action IT WILL BE EXECUTED. And, proper log message
# will be write.
#
#
# AUTORECOVERY. (TOTAL AUTORECOVERY)
#=====================================
# Each time an alarm happens, an action (if defined) is triggered, then a second
# check will be performed, so this monitoring is based on two phases:
# first, checks the alarms, and triggers the actions if any.
# second, evaluates the alarms again, and, only those who persist will be logged
# In that way, ACTION can help to automatize the maintenance and decrease the number of alarms.
#
#
#========================================
# Duplicated defination of one filesystem
# /tmp 50 50 warning
# /tmp 90 90 critical
# File system /tmp exceeds threshold 50% of disk space usage and 50% of inode usage will send alarm with severity warning
# and when exceeds threshold 90% of disk space usage and 90% of inode usage will send alarm with severity critical
#
# * 50 50 warning
# /tmp 90 90 warning
# Character * is special, all file systems’ threshold represent the 50% of disk space usage and the 50% of inode usage except /tmp,
# /tmp only alarm when exceeds threshold of 90% of disk space usage and 90% of inode usage.
#
# /var/tmp 100GB warning
# /var/tmp 90 90 warning
# Different types of threshold, both lines will be available, /var/tmp will send alarm whatever it has less than 100GB or
# exceeds threshold 90% of disk space usage and 90% of inode usage.
#
#
# EXCLUDE. Ignore filesystems
#=====================================
# you can set the directive Filesystem Exclude, for instance
# /tmp exclude
# /var/opt/SAP* exclude
#
# And these file systems will be ignored and therefore not checked and not executed any action
# However, you must note the order IS RELEVANT, for instance
# /tmp 1 1 warning
# /tmp exclude
# will cause the /tmp be ignored, but
# /tmp exclude
# /tmp 1 1 warning
# will make the /tmp be considered.
#
# A more complex example, lets imagine you have file systems like /var/opt/FS* that you want
# be checked, however, you know there some like /var/opt/FS*tmp not relevant for you, that you
# would prefer to ignore:
# /var/opt/FS* 1Gb – warning
# /var/opt/FS*tmp exclude
#
# SO BEWARE, if your last line is like :
# * exclude
# this will be the same like to have the df_mon.cfg empty , ALL WILL BE IGNORED
# CMA SUPPORT
#========================================
# You can access to the EventType and EventTypeInstance using teh [ ] brackets statements
# to set which values are to be used by the following lines
# The syntax of such line is : [ OBJECT , EventType, EventTypeInstance ]
# And the Default values for all cases are NONE
# You can state only the object : [ OBJECT ]
# You can state object and EventType leave EventTypeInstace to default value [OBJECT ,EventType ]
# But you can never use more than three fields or less than 1
# Fields to be separated by commands and blank spaces not allowed
# [ OS, OS, SapApp ]
# * 95 95 warning
#
# NOTE: When a line has set a GROUP or OBJECT and this conflicts with the [ ] statement, the line field will be used
#
#############################################################################
#################
### EXAMPLES #
#################

# disable this module
# disable = yes

# this module allow to run after every 10 minutes
# interval = 10

# Monitor all filesystems
#* 95 95 warning

# exclude the /tmp
#/tmp exclude

# Create different severities for different thresholds for the same filessytem
#/ 80 100 major
#/ 95 100 critical

# Set an action when the disk space is 99% (inodes are ignored)
#/tmp 99 – major *ACTION gzip /tmp/*.log /tmp/*.txt

# Use the GROUP option
#/usr 80 2 major ; FILESYS

# Set an absolute threshold, if there is less than 1Gb of disk free space, then WARNING
# with GROUP FILESYS, and then, execute an action that stores the warning message in a log
#/tmp 1Gb WARNING ; FILESYS
#*ACTION echo !MESSAGE > kk.log ; date >> kk.log

# Use a percent treshold an execute an action (note the syntax, is allowed all in one line)
#/home 99% 1 *ACTION echo !MESSAGE > kk.log ; date >> kk.log

# Another example of ACTION
#/tmp 12 10 *ACTION rm /tmp/*pdf

# Allowed, but really, really dangerous, if this is the last line is like disabling the whole mon
#* exclude

# Using the Cluster Awareness, in this case, ONLY is the package pkg3 is running in this
# node, the /tmp will be checked. And if threshold exceeded, then it will execute an acion
#/tmp 1 1 WARNING *PACKAGE pkg3
#*ACTION echo “hola” > kk.log

#####################
# End of examples #
#####################

###############################
# Start your config from here #
###############################
#REARM = true

* 95 95 warning
* 98 98 major

#############################################################################
# end of df_mon.cfg
#############################################################################

dmsg_mon.cfg

###############################################################################
#@(#) $Id: dmsg_mon.cfg 2132 2014-08-22 06:47:32Z zhaofeif $
#@(#) $Rev: 2132 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2014-08-22 14:47:32 +0800 (Fri, 22 Aug 2014) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

###############################################################################
#
# File: dmsg_mon.
# [disable = yes|no]
# [interval = ]
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes

# Description: strings listed here don’t generate an ITO message for dmesg
# Syntax: just list the strings, one line for each
# !!! all dmesg lines matching one of the listed strings
# are taken out of monitoring !!!
#
# Example:
#
# hardware path
#
# If the string “hardware path” is listed, all dmesg lines matching (containing)
# the string “hardware path” are ignored for monitoring purposes.
# Still, the dmesg history contains these lines, but no message is generated.
#
###############################################################################

###############################################################################
# End of dmesg_mon.cfg
###############################################################################

hw_mon.cfg

#############################################################################
#@ $Id: hw_mon.cfg 2149 2015-03-03 08:45:34Z zhaofeif $
#@ $Rev: 2149 $
#@ $Author: zhaofeif $
#@ $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@ $LastChangedBy: zhaofeif $
##############################################################################
#[REARM = TRUE|FALSE]
#[disable = yes|no]
#[interval = ]
#[ignore string]
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# The module will allow to run after the interval minutes every time

# ignore string
#===============
# All the output got from command match the ignore string will not record to hw_mon.log that means this kind of hardware error will be ignored

# The below lines are default predefined strings for selection as ignore string which user can uncomment out if he/she need ignore the kind of error
# Please don’t modify the below predefined strings which just need the operation of comment or uncomment for you.

#REARM = true

#Power Supply Error
#FAN Error
#Thermal Sensor Error
#Memory Failed
#CPU Failed
#Physical Drive Failed
#Drive Array Accelerator Battery Failed

kts_mon.cfg

##################################################################################
#@ $Id: kts_mon.cfg 2149 2015-03-03 08:45:34Z zhaofeif $
#@ $Rev: 2149 $
#@ $Author: zhaofeif $
#@ $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@ $LastChangedBy: zhaofeif $
##################################################################################
#
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]

# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# PARAMETERS
# THRESH_NP=
# THRESH_NI=
# THRESH_NF=
# THRESH_NK=
#
# THRESH_NP states for the size of the kernel table for running processes
# THRESH_NI states for the size of the kernel table for inode
# THRESH_NF states for the size of the kernel table for opened files
# THRESH_NK states for the size of the kernel table nkthreads (only hpux)
# is a postive integer that states the percentage (please, don’t add the % value)
#
# Support
# Linux, solaris, HPUX supported
# but not all parameters are available
# Those not supported will be simply discarded
# Please, refer to documentation to know which ones are supported
#
#
#REARM = true

THRESH_NP=80
THRESH_NI=101
THRESH_NK=80
#THRESH_NF=70

loop_mon.cfg

###############################################################################
#@(#) $Id: loop_mon.cfg 2149 2015-03-03 08:45:34Z zhaofeif $
#@(#) $Rev: 2149 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
# #############################################################################
#############################################################################
# Syntax:
# [REARM = TRUE|FALSE]
# “DISABLE =[yes|no]”
# [interval = ]

# CPU_THRESHOLD =
# TIME_THRESHOLD =
#
# “EXCEPT_PRC= ”
# “EXCEPT_PRC= ”
################################
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# ENABLE / DISABLE THIS MODULE
#===============
# If you configure
# DISABLE =yes
# loopmon will be disabled, else loopmon will be enabled. The default configuration is enabled

# interval
#===============
# If the module will allow to run after the interval minutes

##################################
# FILTERING PROCESSES
# If you configure:
# EXCEPT_PRC=processname1
# EXCEPT_PRC=processname2
# EXCEPT_PRC=processname3
# Which means the processes processname1, processname2, processname3 will be ignored.
##################3
#
# START YOUR CONFIG FROM HERE
#REARM = true

DISABLE = yes

lp_mon.cfg

###############################################################################
#@(#) $Id: lp_mon_aix_linux_solaris.cfg 2149 2015-03-03 08:45:34Z zhaofeif $
#@(#) $Rev: 2149 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

##############################For AIX and Solaris ####################################
#For AIX and Solaris
# Syntax:
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# lpsched_check=YES|NO
## [queue_length=] [disable_check=YES|NO] [severity] [;]
# exclude [,…]
#
# – lpsched_check – If “YES” check if printing system is available (qdaemon). If “NO”,don’t check it and the monitoring is disabled.
# – queue_length – Max number of pending requests
# – disable_check – Check if printer or spooling is disabled
# – severity. – It can be one out of:critical, major, minor warning
# – exclude – Avoid the printer checked
#
# Examples
# ——-
# LPSCHED_CHECK = YES critical TT_PRINTER
# * queue_length = 20 disable_check = YES major TT_PRINTER
#
# ——
# LPSCHED_CHECK = NO
#
# ——
# LPSCHED_CHECK = YES
# myprint disable_check = YES major LP ; 0800-1800 *
#
####################################################################################
#REARM = true

lpsched_check=NO
#* queue_length=30 disable_check=YES

####################################################################################
# end of lp_mon.cfg
####################################################################################

md_mon.cfg

#############################################################################
#@ $Id: md_mon.cfg 2132 2014-08-22 06:47:32Z zhaofeif $
#@ $Rev: 2175 $
#@ $Author: baoliz $
#@ $Date: 2015-04-22 01:30:55 +0800 (Wed, 22 Apr 2015) $
#@ $LastChangedBy: baoliz $
##############################################################################
#
# Syntax:
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
# exclude
#
# REARM
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# exclude
#========================
# The “exclude ” syntax is used to force entries of the device name or UUID to be ignored.
#

############################################################################
# For example:

# disable this module
# disable = yes

# the mdmon will execute every 5 minutes.
# interval = 5

# Configurable timeout for hanging command ‘mdadm’, default is 120s
# cmd_timeout = 120

# exclude some device
# exclude md1
# exclude md2
# exclude 008ef942:540c0f64:ad58b352:a70017dd

#############################################################################
# end of md_mon.cfg
############################################################################

mp_mon.cfg

#############################################################################
#@ $Id: mp_mon.cfg 2132 2014-08-22 06:47:32Z zhaofeif $
#@ $Rev: 2175 $
#@ $Author: baoliz $
#@ $Date: 2015-04-22 01:30:55 +0800 (Wed, 22 Apr 2015) $
#@ $LastChangedBy: baoliz $
##############################################################################
#
# Syntax:
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
# [ignore_hba = yes|no]
# [ignore_san = yes|no]
# [ignore_alua = yes|no]
# exclude
#
# REARM
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# exclude
#========================
# The “exclude ” syntax is used to force entries of the device name to be ignored.
#
# ignore_hba
#========================
# Default is no. If set yes, disable HBA redundancy check
#
# ignore_san
#========================
# Default is no. If set yes, disable SAN switch redundancy check
#
# ignore_alua
#========================
# Default is no. If set yes, disable ALUA redundancy check
#
############################################################################
# For example:

# disable this module
# disable = yes

# the mdmon will execute every 5 minutes.
# interval = 5

# disable HBA redundancy check
# ignore_hba = yes

# disable SAN switch redundancy check
# ignore_san = yes

# disable ALUA redundancy check
# ignore_alua = yes

# exclude some multipath device name
# exclude eva1_NA2log
# exclude eva1_sbd1

#############################################################################
# end of mp_mon.cfg
############################################################################

nfs_mon.cfg

###############################################################################
#@(#) $Id: nfs_mon.cfg 2149 2015-03-03 08:45:34Z zhaofeif $
#@(#) $Rev: 2149 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: nfs_mon.cfg
# Description: The Diskspace Monitor Configuration file
# Package : GD UXMON (AROA PROJECT)
#############################################################################

#############################################################################
#—————————–
# DESCRIPTION
# This monitoring will check those NFS fs that been found already mounted. The check
# will be done with a read test only. (df command) that will performed trough
# a forked child.
#
# ————————
# The configuration is per line based
# CONFIGURATION
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
# [GROUP]
# [GROUP , EventType]
# [GROUP , EventType, EventTypeInstance]
#
# exclude
# [*ACTION action]
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# FS File system
#===================
# Is the local file system where is expected the export nfs be mounted.
# If this does not exists, it will be ignored silently
#
# File system also support character * in it.
# *
# will expand * as all nfs file systems defined as auto mount in /etc/fstab(Hpux/Linux), /usr/sbin/lsfs(AIX) or /etc/vfstab(Solaris).
# *
# will check all nfs file systems’ utilization which already be mounted.
# /user/*
# will check file system which name begins with “/user/” .
#
#
# POLLING|POLLING,TIMEOUT|UTILIZATION
#================================
# POLLING|POLLING,TIMEOUT
# The format: $1 or $1,$2
# $1 is the number of times to wait until perform the check. For instance, if you know
# this script is triggered by the ito template each 15 minutes, and the value
# you set is 10, then, the automonter will be cheked (and therefore, mounted if it’s
# not mounted) each 15*10 minutes=150 minutes, approx.
# Please, don’t set this value too low.
#
# $2 is the number of seconds that child process timeout.
#
# UTILIZATION
# The format: x% or x(Mb|kb|Gb)
# x is the number of file system utilization, x% will check the used space and x(Mb|kb|Gb) will check the free space.
# x%, for example 10% that means if the utilization of the file system exceeds 10%, an alarm will generate to OVO.
# x(Mb|kb|Gb), for example, 20Mb that means if the free space of the file system is less than 20Mb, an alarm will trigger.
#
#
# EXCLUDE. Ignore filesystems
#=====================================
# you can set the directive Filesystem Exclude, for instance
# /tmp exclude
# /var/opt/SAP* exclude
#
# And these file systems will be ignored and therefore not checked and not executed any action
# However, you must note the order IS RELEVANT, for instance
# /tmp 1 warning
# /tmp exclude
# will cause the /tmp be ignored, but
# /tmp exclude
# /tmp 1 warning
# will make the /tmp be considered.
#
#
# SEVERITY
#===================
# The SEVERITY can be one of the following:
# WARNING, MINOR, MAJOR, CRITICAL (case insensitive)
# If not explicited, it will be assumed warning.
# Please, be considerd with the severities, think twice prior to use CRITICAL, the most severe.
#
# SCHEDULE
#=====================
# You can define the time frame for each line/configuration giving an specific Schedule.
# This means that one file or configuration it will be only considered when the current time falls
# within the scheduled time configured
# The format is – , some examples are:
# 0000-2400 * The basic configuration means in any day in any moment
# 0900-1800 1-5 Means to check from 09:00 am to 18:00 pm from Monday to Friday.
# 0000-2400 6,0 Means at any hour only Saturday and Sunday
#
#
# GROUPING
#==============================
# You can group the alarms, giving them the same GROUP. In that case
# only will be reported the MOST SEVERE ALARM and others will be masked by this one
# If you don’t set a group, is considered then the group NONE.
# Even although the message be masked by another more critical message
# in the same group, its action IT WILL BE EXECUTED. And, proper log message
# will be write.
#
# ACTIONS
#================
# You can declare an ACTION, that will be triggered if the threshold is
# exceeded. The ACTION follow the syntax *ACTION
# Examples of allowed syntaxis:
# /tmp 1 warning mygroup
# *ACTION mount /tmp /tmp
#
# /home/userx 20% warning
# *ACTION echo “!FSNAME” >> /tmp/nfs.log
#
# Therefore, the ACTION can be in a different line or just following the
# filesystem declaration. (all in one line)
# Besides, you have some variables that will be replaced prior to trigger
# the execution:
# !FSNAME (filesystem name)
# !GROUP (the group used)
# !MESSAGE (the same message that will be write in the logfile)
# !REASON (STAT or UTILIZATION, depending the reason that triggered the alarm)
# !THRESHOLD (The Threshold that has been exceeded)
# !CURRENT (The current value compared with the threshold)
# !SEVERITY (The severity declared in this alarm)
#
#
# General Considerations and Defaults
#=========================================
#
# If no line present, no NFS will be checked, so no messages at all
# Is highly recommeneded those FS managed by automonter have a big interval polling,
# no less than 20
# Anyway, is discouraged a polling interval less than 3
# If Group is not set, assumed the default NONE
# If severity is not set, assumed the default WARNING
#
#
# CMA SUPPORT
#========================================
# You can access to the EventType and EventTypeInstance using teh [ ] brackets statements
# to set which values are to be used by the following lines
# The syntax of such line is : [ OBJECT , EventType, EventTypeInstance ]
# And the Default values for all cases are NONE
# You can state only the object : [ OBJECT ]
# You can state object and EventType leave EventTypeInstace to default value [OBJECT ,EventType ]
# But you can never use more than three fields or less than 1
# Fields to be separated by commands and blank spaces not allowed
# [ OS, OS, SapApp ]
# * 95 95 warning
#
# NOTE: When a line has set a GROUP or OBJECT and this conflicts with the [ ] statement, the line field will be used
#
#EXAMPLES
#===========================
#REARM = true

# disable this module
# disable = yes

# this module allow to run after every 10 minutes
# interval = 10

# Examples
#
#/mnt/tmp 10 critical
#/mount/m1 20 minor 0000-2400 * MYGROUP
# * 10 major
#
# ,
#/mnh 8, 20 mygroup
#
#
#/user/baoliz 20% mygroup
#*ACTION rm /user/baoliz/*.log
#/user/baoliz 200Mb minor 0000-2400 * MYGROUP
#/user/* 50% mygroup
#/user/who exclude
#

nic_mon.cfg

############################################################################
# Description of parameters
# ——————————————————————-
#Syntax:
#
#[REARM = TRUE|FALSE]
#[disable = yes|no]
#[interval = ]
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
# [msg_group = ] #Specific message group here.
# [ping_retry = ] #Default ping_retry value is 2 even without ping_retry setting, which means default timeout is 2*5 seconds.
# With ping_retry setting, timeout will be *5 seconds, make sure the number is more than 2.
#AutoDiscovery=ENABLED #To discover the Network Interfaces as well as to validate the network connections and routing table.
#AutoDiscovery=DISABLED #To stop auto-discovery and manually configure and monitor Network Interfaces
# #Specific Network Interface to monitor along with FQDN and severity
#*EXCLUDE_NIC / #Excluding some specific Network Interface(s) or Gateway(s) from monitoring
#*CLUSTER_NIC #Monitor cluster Network Interface(s).
#############################################################################
#REARM = true

AutoDiscovery=DISABLED
#############################################################################

ntp_mon.cfg

###############################################################################
#@(#) $Id: ntp_mon.cfg 2149 2015-03-03 08:45:34Z zhaofeif $
#@(#) $Rev: 2149 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

############################################
# NTP_MON.CFG
# DESCRIPTION
#——————-
# Monitoring the NTP Daemon
# You can configure the monitoring to watch over
# the offset reported by the ntpq -p command
#
# PARAMS DESCRIPTION
#——————————
# [REARM = TRUE|FALSE]
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# “DISABLE =[yes|no]” : This is the switch for ntpmon running
# NTP_OFFSET_CRITICAL,NTP_OFFSET_WARNING set offset critical and warning threshold for active peer, devaut values are 500 and 250 in ms
# NTP_STRATUM_CRITICAL, NTP_STRATUM_WARNING set offset of stratum of peers which status are one of “*”, “o”, “+” or “#”, the default value are 16 and 10
# NTP_ONLINE_TIME , ntpmon will not check anything if process ntpd is not running more than NTP_ONLINE_TIME(minutes), default value is 10(min)
# ALARM_DELAY, ntpmon allow to delay triggering alarm by setting minutes.
#——————————
# PARAMS REQUIRED
#———————————–
#
# CONFIG FILE
#————————–
############################################
#
# EXAMPLE:
#
#
#################################################################
#
# Set your configuration from here
#REARM = TRUE

DISABLE = NO
NTP_OFFSET_CRITICAL 530
NTP_OFFSET_WARNING 230
NTP_STRATUM_CRITICAL 15
NTP_STRATUM_WARNING 12
NTP_ONLINE_TIME 3
#ALARM_DELAY 60

# Set your configuration from here

ps_mon.cfg

###############################################################################
#@(#) $Id: ps_mon.cfg 2194 2015-05-15 12:57:28Z baoliz $
#@(#) $Rev: 2194 $
#@(#) $Author: baoliz $
#@(#) $Date: 2015-05-15 20:57:28 +0800 (Fri, 15 May 2015) $
#@(#) $LastChangedBy: baoliz $
# #############################################################################

#############################################################################
# File: ps_mon.cfg
# Description: The Unix process Monitor Configuration file
# Package : RMM UXMON
#############################################################################

#############################################################################
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
# [ACTION_TIMEOUT = ]
# [ ] []
#
# [*PINFO ]
# [*PUSER ]
# [*ACTION ]
# [*PACKAGE ]
# [*IPISLOCAL ]
# [*ELAPSED ]
# [*PATH ]
# [*FILEEXISTS ]
# [*ARGS ]
# [*EXACT_ARGS ]
# [*WITHOUT_ARGS ]
# [*ZONE ]
# [*DURATION ]
#
# = n | n- | n-m
# n : exactly n instances
# n- : n or more instances
# n-m : n to m instances
#
# = – [,]
#
# = critical | major | minor | warning
#
# = Free Text without blanks Max 8 chars
# This text will be used in the ITO message to set the OBJECT ito message field
# so it’s posible to use it in order to define the ITO-OVSD mappings
# Also, is used for the message grouping. Only one message of a group will
# be reported, the one with highest severity, others will be masked (although
# it’s respective actions will be executed)
#
#
# Note 1: Some processes change names after they are invoked so be sure
# to use the name as listed by ps OS command
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# ACTION
#===================
# You can declare an ACTION, that will be triggered if the threshold is
# exceeded. The ACTION follow the syntax *ACTION
# Examples of allowed syntaxis:
# *ACTION ps -e -o pid !PID
#
# Therefore, the ACTION can be in a different line or just following the
# filesystem declaration. (all in one line)
# Besides, you have some variables that will be replaced prior to trigger
# the execution:
# !PROCESS (process name)
# !GROUP (the group used)
# !MESSAGE (the same message that will be write in the logfile)
# !REASON (SPACE or INODE, depending the reason that triggered the alarm)
# !THRESHOLD (The Threshold that has been exceeded)
# !CURRENT (The current value compared with the threshold)
# !SEVERITY (The severity declared in this alarm)
# !PID (The pid number)
#
#
# ACTION_TIMEOUT
#=======================
# TIMEOUT action process will be killed. Without specified, default is 15 minutes.
#
#
# PACKAGE. Cluster Awareness
#=============================
# There is another dirctive , *PACKAGE pkgname, that can be used to inform to PSMON that
# this resource is managed by a Cluster, and more specifically included in the package named
# pkgname
# The PSMON , prior to test that file system, it will check if the cluster package is in fact
# running in this local node, if yes, it will process it as any other file system. If it’s not
# running in this node (therefore is running in another one), it will ignore this file system

# IPISLOCAL. IP address or FQHN
#=============================
# Check if IP address or FQHN is assigned to monitored node and if a corresponding interface is UP

# DURATION
#=============================
# Ticket is triggered after duration minutes and ticket will be held within DURATION minutes, if DURATION is configured. .

# GROUPING
#====================
# You can group the alarms, giving them the same GROUP. In that case
# only will be reported the MOST SEVERE ALARM
# If you don’t set a group, is considered then the group NONE.
# Even although the message be masked by another more critical message
# in the same group, its action IT WILL BE EXECUTED. And, proper log message
# will be write.
#
# ACTION. AUTORECOVERY TOTAL
#==============================
# Each time an alarm happens, an action (if defined) is triggered.
# However, a second check will happen, so this monitoring is based on two phases
# first, checks the alarms, and triggers the actions if any.
# second, evaluates the alarms, and, only those who persist will be logged
# in that way, ACTION can help to automatize the maintenance and reduce the number of alarms.
#
# FILTER. PUSER
#=================
# PUSER : This directive helps to filter by the user owner (real user or effective user) of the process. If you set this
# only will be considered those processes with such uer name, for instance
# httpd warning 1-
# *PUSER webuser
#
# FILTER. PATH
#==================
# PATH: You can filter out for the PATH. If you set a path /usr/bin, for a process P
# if the system founds /opt/bin/P is ignored. It works in the same way than PUSER
# process warning 1-
# *PATH /usr/bin
#
# FILTER. FILEEXISTS
#==================
# FILEEXISTS: This is used to check whether the process binary file exists. if the binary file doesn’t exist, there will ignore the process checking without alarm.
#
# process warning 1-
# *FILEEXISTS /usr/bin/process
#
# FILTER. ARGS
#====================
# ARGS: A process can have arguments or switches, a way to discriminate between them
# is using this directive.
# *ARGS -d root
# *ARGS file=myfile.log
# *ARGS argu # “argu” can match, “argument” also can match
# *ARGS /.*/test.sh # wildcard .* can be used
#
# FILTER. EXACT_ARGS
#====================
# EXACT_ARGS: mostly like the directive ARGS, but if use EXACT_ARGS, there must be exactly match the arguments and
# any longer string that can substring this argument can’t match it any more, for example
# *EXACT_ARGS argu # “argu” can match, “argument” can’t match any longer
#
# FILTER. WITHOUT_ARGS
#====================
# WITHOUT_ARGS: A process can be running without arguments or switches, for example
#
# Xvnc major 0 SECURITY
# *WITHOUT_ARGS -localhost
#
# That means only 0 process Xvnc can be running without the argument -localhost, in another word, if Xvnc is running it must be with
# the argument -localhost
#
# FILTER. ZONE
#======================
# ZONE: In some operating systems exists zones and different spaces for running processes,
# as is the case of SOLARIS 10 and their ZONES. By default PSMON looks to all processes
# available, but, if you want to focus only in one specific ZONE, you can use this option
#
#
# ELAPSED TIME directive
#==========================
# ELAPSED : You can set an alarm to check how much time a process has been running
# in minutes, hours or days, for instance
# *ELAPSED 5m
# *ELAPSED 4d
# *ELAPSED 36h
# In case any instance of that process exceeds the threshold you will get an alarm
#
#
# RESOURCE CONSUMPTION. PINFO directive
#=======================================
# PINFO: You can monitor how much total CPU a process is taken, or how much
# memory (virtual memory) is taken. In the case of memory, note this value
# is platform dependant, and its meaning is the VSZ file of their ps command
# http 1-
# *PINFO 80 1900
#
# Note that the CPU means the total cpu , with all cpu’s averaged, so ,
# in case you have 4 cpu a value of 25 means 100% of one CPU. This is not
# exact, but it gives a good approximation.
#
# CMA SUPPORT
#========================================
# You can access to the EventType and EventTypeInstance using teh [ ] brackets statements
# to set which values are to be used by the following lines
# The syntax of such line is : [ OBJECT , EventType, EventTypeInstance ]
# And the Default values for all cases are NONE
# You can state only the object : [ OBJECT ]
# You can state object and EventType leave EventTypeInstace to default value [OBJECT ,EventType ]
# But you can never use more than three fields or less than 1
# Fields to be separated by commands and blank spaces not allowed
# [ OS, OS, SapApp ]
# process 1- warning
#
# NOTE: When a line has set a GROUP or OBJECT and this conflicts with the [ ] statement, the line field will be used
#############################################################################

#############################################################################
# Examples
# REARM = true
# disable this module
# disable = yes

# this module allow to run after every 10 minutes
# interval = 10

# Check that at least one httpd process is running, assign to group WEB, filter
# by user apache. And in case threshold be exceeded, exdecute the action.
# httpd 1- WEB
# *PUSER apache
# *ACTION echo !PROCESS > logactions.log
#
#

#####################
# End of examples #
#####################

###############################
# Start your config from here #
###############################

#############################################################################
# end of ps_mon.cfg
#############################################################################

rc_mon.cfg

#!/usr/bin/perl
##################################################################################
#@(#) $Id: rc_mon.cfg-all-other 2149 2015-03-03 08:45:34Z zhaofeif $
#@(#) $Rev: 2149 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
##################################################################################
#
# Resource Monitoring configuration file
#__________________________________________
#
#
# In this file you can configure the UXMONrcmon giving the thresholds to the metrics
# you have defined in the metrics configuration file. In this way, setting these thresholds
# you can be warned with an OVO alarm when the metric value breaks such thresholds in excess
# or in defect.
#
# Syntax description
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
#———————-
# You can use the [GROUP] label to define a default GROUP for the underling lines however note
# in this current release (UXMON 1.2) is useless. Just simply is there for next versions. by now it does
# harm to have it there but it will not do nothing.
#
# METRIC CATEGORY INSTANCE SEVERITY THRESHOLD APPLICATION OBJECT SCHEDULE NODEGROUP
# METRIC: between ” chars is the metric name defined in the Metrics configuration file. Case Sensitive.
# CATEGORY: Each metric must be within a category, that defines subsets of metrics. In this way we avoid name collision
# as two metrics with the same name can coexist as long as they belong to different Categories
# (as defined in the METRICS CONFIGURATION FILE)
# INSTANCE: A metric can measure several “instances” of one concept, for instance, disk space, each file system is an instance in this case
# Note that the instances must be returned by the execution of the metric defined in the METRICS CONFIGURATION FILE
# SEVERITY: Typical values: Critical, Major, Minor, Warning
# THRESHOLD: You can define here what is the edge that if crossed will cause an alarm. You can set > or 4.5 default default Sun-Sat@00:00-23:59 #NodeGroup: UX_AR_test
#”File System Total (MB)” [diskspace] * warning >100Mb default group Sun-Sat@0:00-23:59 #NodeGroup: UX_AR_test
#
# Note, these examples were give to illustrate the syntax, if you want to use you have to take care such Metrics and Categories be defined in the
# METRICS configuration file

#
# Threshold definition for kerneel parameters monitoring (not all are monitored in all paltforms !!!)
# For monitored parameters see the metrics definition file UXMONmetrics.cfg and comment out unnecessary lines below.
#
#”Kernel Parameters Check (%): NP, NI, NF, NK” [kernel] NP Major >80 default default Sun-Sat@0:00-23:59 #NodeGroup: ktsmon
#”Kernel Parameters Check (%): NP, NI, NF, NK” [kernel] NI Major >6 default default Sun-Sat@00:00-23:59 #NodeGroup: ktsmon
#”Kernel Parameters Check (%): NP, NI, NF, NK” [kernel] NF Major >6 default default Sun-Sat@00:00-23:59 #NodeGroup: ktsmon
#”Kernel Parameters Check (%): NP, NI, NF, NK” [kernel] NK Major >6 default default Sun-Sat@00:00-23:59 #NodeGroup: ktsmon

#
# Threshold definition for user quota monitoring
#
#”User Quota Check (%)” [quota] user1 Major >90 default default Sun-Sat@0:00-23:59 #NodeGroup: uq_mon
#”User Quota Check (%)” [quota] user2 Warning >80 default default Sun-Sat@0:00-23:59 #NodeGroup: uq_mon

#
# Threshold definition for monitoring of total swap space used (the keyword “total” is mandatory in configuration)
#
#”Total Swap Space Check (%)” [swap] total Major >95 default default Sun-Sat@00:00-23:59 #NodeGroup: swapmon

sc_mon.cfg

###############################################################################
#@(#) $Id: sc_mon.cfg 2162 2015-03-19 08:40:05Z zhaofeif $
#@(#) $Rev: 2162 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-19 16:40:05 +0800 (Thu, 19 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: sc_mon.cfg
#
#############################################################################
#
# configuration for SC monitoring
#
# Description of parameters
# —————————————————————————
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# RG[0]=XYZ Package name 1
# RG_NODE[0]=ABC Primary node on which the package must run
# Define as * will disable running on adoptive node check
# RG_SWTCH[0]=1 Set to 1 if Package_switching should be ENABLED
# Set to 0 if Package_switching must not be ENABLED
#
# numbers in [] represent a package
#
#
#############################################################################
# example configuration:
# REARM = TRUE
# RG[0]=sg_pack_one; RG_NODE[0]=sg_node_one; RG_SWTCH[0]=1
# RG[1]=sg_pack_two; RG_NODE[1]=sg_node_two; RG_SWTCH[1]=1
# RG[2]=sg_pack_three; RG_NODE[2]=*; RG_SWTCH[2]=1
#############################################################################
# end of sc_mon.cfg
#############################################################################

scsi_mon.cfg

###############################################################################
#@(#) $Id: scsi_mon.cfg 2162 2015-03-19 08:40:05Z zhaofeif $
#@(#) $Rev: 2162 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-19 16:40:05 +0800 (Thu, 19 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: scsi_mon.cfg
# Description: The multi path Monitor Configuration file
# Package : GD UXMON (AROA PROJECT)
#############################################################################

################################################################################
#
#
################################################################################
#
# Syntax:
#
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]

# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
#1. EXCLUDE_DEVICES=
# default: is empty

#2. PROTOCOLS=
# default: PROTOCOLS=fibre_channel

#3. CHECK_UNOPEN_DEVICES=
# default: CHECK_UNOPEN_DEVICES=yes
# The state is from command “scsimgr get_info -D device”

#4. DISABLE_MULTIPATH_MONITORING=
# default: DISABLE_MULTIPATH_MONITORING=no
# If this is set to yes monitor should not be triggered at all.#

#5. CHECK_REDUNDANT=
# default: CHECK_REDUNDANT=no
# If this is set to yes will check redundant, otherwise will ignore.#

# EXCLUDE_DEVICES
# ==================
# if argument is provided with “*”, the disk check will take the argument as regular expression. eg(disk1* matches disk1,disk12,disk122) in this case only one argument is possible

# if argument(s) is provided without “*”, the disk check will excplicitly exclude that disk(s)
# if more than 1 argument are supplied, then “*” can’t be used. Arguments should be seperated by comma.
#
#
# PROTOCOLS
#===================
# supply the device type seperated by comma if more than 1 device type specified, otherwise write one device without a comma
#############################################################################
#
#
################################################################################
#
# examples
#
#EXCLUDE_DEVICES=disk4 ,disk6,disk7
#PROTOCOLS= parallel_scsi
#CHECK_UNOPEN_DEVICES= yes
#DISABLE_MULTIPATH_MONITORING=no
#
#
#############################################
# Make your CONFIGURATION FROM HERE #
#############################################
#REARM = TRUE
EXCLUDE_DEVICES=
PROTOCOLS= fibre_channel
CHECK_UNOPEN_DEVICES= no
DISABLE_MULTIPATH_MONITORING= no
CHECK_REDUNDANT=no

sg_mon.cfg

###############################################################################
#@(#) $Id: sg_mon.cfg 2162 2015-03-19 08:40:05Z zhaofeif $
#@(#) $Rev: 2162 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-19 16:40:05 +0800 (Thu, 19 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: sg_mon.cfg
# Description: Check service guard Package monitoring script
# Package : Concorde – UXSM
# Version: A.01.00
#
#############################################################################
#
# configuration for SG monitoring
# Script sg_mon.ksh
#
# Description of parameters
# —————————————————————————
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]

# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
#
# [DISABLE_CLUSTER_LOCK_CHECK = yes|no]
# Automatic checking of Cluster Lock (Lock Disk, Lock LUN, or Quorum server) supports for SG version after (and including) A.11.17,
# and customized QUORUM_SERVER specifications will be ignored.
# If set as YES (or yes), will disable automatic checking of Cluster Lock.
#
#================
# If the module will allow to run after the interval minutes

# PKG[0]=XYZ Package name 1
# PKG_NODE[0]=ABC Primary node on which the package must run
# Define as * will disable running on adoptive node check
# PKG_SWTCH[0]=1 Set to 1 if Package_switching should be ENABLED
# Set to 0 if Package_switching must not be ENABLED
#
# numbers in [] represent a package
#
# QUORUM_SERVER[0]=server_name Quorum server name
# QUORUM_NODE[0]=node_name node name in the cluster
#
# Global parameter:
# LAN_MON=1 (default) If LAN interfaces should be monitored (up/down).
# LAN_MON=0 If LAN interfaces should not be monitored.
# GROUP=NONE Default is NONE,In that case
# only will be reported the MOST SEVERE ALARM
# and others will be masked by this one
#############################################################################
# example configuration:
# REARM = TRUE
# PKG[0]=sg_pack_one; PKG_NODE[0]=sg_node_one; PKG_SWTCH[0]=1
# PKG[1]=sg_pack_two; PKG_NODE[1]=sg_node_two; PKG_SWTCH[1]=1
# PKG[2]=sg_pack_three; PKG_NODE[2]=*; PKG_SWTCH[2]=1
# QUORUM_SERVER[0]=server_name_one; QUORUM_NODE[0]=sg_node_name_one
# QUORUM_SERVER[1]=server_name_two; QUORUM_NODE[1]=sg_node_name_two
# LAN_MON=1
# GROUP=MYGROUP
#############################################################################
# end of sg_mon.cfg
#############################################################################

sshd_mon.cfg

###############################################################################
#@(#) $Id: sshd_mon.cfg 2162 2015-03-19 08:40:05Z zhaofeif $
#@(#) $Rev: 2162 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-19 16:40:05 +0800 (Thu, 19 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: sshd_mon.cfg
# Description: The sshd Monitor Configuration file
# Package : GD UXMON (AROA PROJECT)
#############################################################################
############################################################################
# Description of parameters
# ——————————————————————-
#Syntax:
#
#[REARM = TRUE|FALSE]
#[disable = yes|no]
#[interval = ]
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# [CUSTOM_SSHD_BIN_LIST = ]
# Default SSHD path is /opt/sshd/sbin/sshd, /usr/local/sbin/sshd, /usr/sbin/sshd, /opt/ssh/sbin/sshd, /usr/lib/ssh/sshd
# [CUSTOM_SSHD_PID_LIST = ]
# Default sshd.pid file path is /var/run/sshd.pid, /var/run/sshd.init.pid, /var/run/sshd-quest.pid, /usr/local/etc/sshd.pid, /var/openssh/sshd.pid,/etc/sshd.pid
# If sshd binary or sshd.pid file not in default list, define comma-separated path as following.
# Eg:
# CUSTOM_SSHD_BIN_LIST = /usr/mypath1/sshd, /usr/mypath2/sshd
# CUSTOM_SSHD_PID_LIST = /var/mypath/sshd.pid
#############################################################################
# REARM = TRUE

#############################################################################

svcs_mon.cfg

###############################################################################
# GD UX MON #
#@(#) $Id: svcs_mon.cfg 2162 2015-03-19 08:40:05Z zhaofeif $
#@(#) $Rev: 2162 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2012-07-25 18:54:50 +0800
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: svcs_mon.cfg
# Description: The File Activity Monitor Configuration file
# Package : GD UXMON (AROA PROJECT)
#############################################################################

################################################################################
#
# The intention of this script is to monitor solaris service status
#
#

################################################################################
#
# Syntax:
# ——————————————————————-
#[REARM = TRUE|FALSE]
#[disable = yes|no]
#[interval = ]
#[EXCLUDE_SERVICE = ]
#
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
#===============
# set specific services exclude, for instance
# EXCLUDE_SERVICE = /application/print/server
#############################################################################
# REARM = TRUE

#############################################################################

swap_mon.cfg

###############################################################################
#@(#) $Id: swap_mon.cfg 2162 2015-03-19 08:40:05Z zhaofeif $
#@(#) $Rev: 2162 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-19 16:40:05 +0800 (Thu, 19 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################
#
# All lines which start with a hash-sign (#) will be ignored
# the whole config file is case-insensitive
#
# THERE ARE NO SYNTAX CHECKS SO MAKE SURE THAT YOUR CONFIGURATION
# IS CORRECT!!!
#
# Every config line has the following layout:
# total [ []]
#
# Every config line must start with “total” because every check is
# performed on the totally free space
#
# The percent used must be given as a positive integer, it must be
# inbetween 0 and 100. If the given percentage is exceeded, an alarm
# is raised.
#
# In the third column you have to configure a severity which is used for
# this alert. Possible severities are: warning, major, critical
#
# With the alert type you can configure which alarm will be raised. Possible
# values are:
# B -> Browser
# N -> Browser+Notification
# T -> Browser+Trouble Ticket
# NT -> Browser+Notification+Trouble Ticket
#
# The from-to gives the time of the day when the script shall run.
# The from and the to time are given as 4-digit-numbers in the 24h format
# So “all day long” would be “0000-2400” (in the configline without the quotes!)
# If you don’t configure a from-to time, the script will run all day long
#
# If you have configured the daytime on which the script shall run, you can also
# configure on which days the script will run. The days are given with crontab
# syntax. You can either give multiple days separated with commas from each other
# (e.g. “1,3,4” -> Monday, Wednesday, Thursday) or you can give span of days
# separated with a hyphen (e.g. “2-4” -> from Tuesday until Thursday).
# In this case the first number has always to be lower than the second one!!!
# As you could already see in the examples, the week starts with “0” for sunday
# and ends with “6” for saturday.
#
# You can configure multiple percent_used entries for the same day(s), the
# script will always raise the alert of the highest threshold that is exceeded
#
#[REARM = TRUE|FALSE]
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
###############################################################################

################################################################################
# Example configuration
###############################################################################
#
#REARM = TRUE

#total percent_used severity Alert FROM-TO Days
total 95 major T 0000-2400 *

UXMONbroker.cfg

###############################################################################
#@(#) $Id: UXMONbroker.cfg.linux.ovo8 2177 2015-04-29 08:06:01Z zhaofeif $
#@(#) $Rev: 2177 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-04-29 16:06:01 +0800 (Wed, 29 Apr 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

#$UXMON_OVO_RELEASE = “02.01.03”;

############################
# OVO Agent information
$UXMON_OVO_BASE = “/var/opt/OV”;
$UXMON_OVO_TMP = $UXMON_OVO_BASE.”/tmp/OpC”;
$UXMON_OVO_CMDS = $UXMON_OVO_BASE.”/bin/instrumentation”;
$UXMON_OVO_DFT_CFG = $UXMON_OVO_CMDS;
$UXMON_OVO_LOG = $UXMON_OVO_BASE.”/log/OpC”;
$UXMON_OVO_CFG = $UXMON_OVO_BASE.”/conf/OpC”;

##################################################
# OS Commands and info
# PLATFORM DEPENDENCY
$UXMON_OS_HOSTNAME = hostname();
$UXMON_SG_CMVIEWCL = “/usr/local/cmcluster/bin/cmviewcl”;
$UXMON_VT_HAGRP = “/opt/VRTSvcs/bin/hagrp”;
$UXMON_CRON_LOG = “/var/spool/mail/root”;
$UXMON_SHELL = “/bin/bash”;
$UXMON_HACMP_HAGRP = “/usr/es/sbin/cluster/utilities/clRGinfo”;

$UXMON_PATH_SGMON = “/usr/local/cmcluster/bin:/opt/cmcluster/bin:/usr/sbin”;

################
# Perl runtime to be used internally
$UXMON_OVO_PERL = “/opt/OV/nonOV/perl/a/bin/perl”;

#######################
# Default Values
$UXMON_DFT_SEVERITY = “warning”;
$UXMON_DFT_GROUP = “NONE”;
$UXMON_DFT_EVENTTYPE = “NONE”;
$UXMON_DFT_EVENTINSTANCE = “NONE”;

$UXMON_SC_SCSTAT = “/usr/cluster/bin/scstat”;
$UXMON_MODULE_METRICS_FILE = “UXMONmetrics.cfg”;
$UXMON_MODULE_METRICS_PATH = “$UXMON_OVO_CFG/$UXMON_MODULE_METRICS_FILE”;

#############################33
# In hours, is the threshold to consider a module is inactive
$UXMON_MODULE_INACTIVE = 10;

################################
# The number of perl processes to consider something is wrong
$UXMON_MAX_PERL_INSTANCES = 100;

##########################
# TIMEOUT for OS commands. It will wait only those seconds, when this is timeout, it will trigger ‘got block’ alarm.
$UXMON_OS_TIMEOUT = 300 ;

##########################
# TIMEOUT for module running. how many times of INTERVAL running for timeout, it will trigger ‘running timeout’ alarm.
$UXMON_MODULE_TIMEOUT = 3;

#############################################################
$UXMON_MODULE_EXEC{dfmon} = “UXMONdfmon”;
$UXMON_MODULE_EXEC{psmon} = “UXMONpsmon”;
$UXMON_MODULE_EXEC{volmon} = “UXMONvolmon”;
$UXMON_MODULE_EXEC{cronmon} = “UXMONcronmon”;
$UXMON_MODULE_EXEC{actmon} = “UXMONactmon”;
$UXMON_MODULE_EXEC{ntpmon} = “UXMONntpmon”;
$UXMON_MODULE_EXEC{nfsmon} = “UXMONnfsmon”;
$UXMON_MODULE_EXEC{lpmon} = “UXMONlpmon”;
$UXMON_MODULE_EXEC{perfmon} = “UXMONperfmon”;
$UXMON_MODULE_EXEC{sshdmon} = “UXMONsshdmon”;
$UXMON_MODULE_EXEC{rcmon} = “UXMONrcmon”;
$UXMON_MODULE_EXEC{evm} = “UXMONevm”;
$UXMON_MODULE_EXEC{loopmon} = “UXMONloopmon”;
$UXMON_MODULE_EXEC{uxmon} = “UXMONuxmon”;
$UXMON_MODULE_EXEC{advfsmon} = “UXMONadvfsmon”;
$UXMON_MODULE_EXEC{dmesg} = “UXMONdmsg”;
$UXMON_MODULE_EXEC{bootmon} = “UXMONbootmon”;
$UXMON_MODULE_EXEC{scmon} = “UXMONscmon”;
$UXMON_MODULE_EXEC{sgmon} = “UXMONsgmon”;
$UXMON_MODULE_EXEC{vcmon} = “UXMONvcmon”;
$UXMON_MODULE_EXEC{ktsmon} = “UXMONktsmon”;
$UXMON_MODULE_EXEC{selfcheck} = “UXMONselfcheck”;
$UXMON_MODULE_EXEC{nicmon} = “UXMONnicmon”;
$UXMON_MODULE_EXEC{hwmon} = “UXMONhwmon”;
$UXMON_MODULE_EXEC{mdmon} = “UXMONmdmon”;
$UXMON_MODULE_EXEC{bondmon} = “UXMONbondmon”;
$UXMON_MODULE_EXEC{mpmon} = “UXMONmpmon”;
$UXMON_MODULE_EXEC{swapmon} = “UXMONswapmon”;

@UXMON_LIST_MODULES = keys (%UXMON_MODULE_EXEC);

##############################################################################################
# The complete execution command
foreach $k (@UXMON_LIST_MODULES)
{
$UXMON_MODULE_EXECPATH{$k} = catfile($UXMON_OVO_CMDS,$UXMON_MODULE_EXEC{$k});
}

##########################################################3
# Collect interface, only defined for these three modules
$UXMON_MODULE_INTERFACE{dfmon} = “UXMONcollectDFMON”;
$UXMON_MODULE_INTERFACE{psmon} = “UXMONcollectPSMON”;
$UXMON_MODULE_INTERFACE{actmon} = “UXMONcollectACTMON”;

#################################################
# The logfile
$UXMON_MODULE_LOGFILE{dfmon} = “df_mon.log”;
$UXMON_MODULE_LOGFILE{psmon} = “ps_mon.log”;
$UXMON_MODULE_LOGFILE{actmon} = “act_mon.log”;
$UXMON_MODULE_LOGFILE{volmon} = “vol_mon.log”;
$UXMON_MODULE_LOGFILE{cronmon} = “cron_mon.log”;
$UXMON_MODULE_LOGFILE{ntpmon} = “ntp_mon.log”;
$UXMON_MODULE_LOGFILE{lpmon} = “lp_mon.log”;
$UXMON_MODULE_LOGFILE{nfsmon} = “nfs_mon.log”;
$UXMON_MODULE_LOGFILE{perfmon} = “perf_mon.log”;
$UXMON_MODULE_LOGFILE{dmesg} = “dmsg_mon.hist”;
$UXMON_MODULE_LOGFILE{ktsmon} = “kts_mon.log”;
$UXMON_MODULE_LOGFILE{sshdmon} = “sshd_mon.log”;
$UXMON_MODULE_LOGFILE{rcmon} = “rc_mon.log”;
$UXMON_MODULE_LOGFILE{evm} = “evm_mon.log”;
$UXMON_MODULE_LOGFILE{loopmon} = “loop_mon.log”;
$UXMON_MODULE_LOGFILE{uxmon} = “uxmon.log”;
$UXMON_MODULE_LOGFILE{advfsmon} = “advfs_mon.log”;
$UXMON_MODULE_LOGFILE{dmesg} = “dmsg_mon.log”;
$UXMON_MODULE_LOGFILE{bootmon} = “boot_mon.log”;
$UXMON_MODULE_LOGFILE{scmon} = “sc_mon.log”;
$UXMON_MODULE_LOGFILE{sgmon} = “sg_mon.log”;
$UXMON_MODULE_LOGFILE{vcmon} = “vc_mon.log”;
$UXMON_MODULE_LOGFILE{selfcheck} = “uxmon.log”;
$UXMON_MODULE_LOGFILE{nicmon} = “nic_mon.log”;
$UXMON_MODULE_LOGFILE{hwmon} = “hw_mon.log”;
$UXMON_MODULE_LOGFILE{mdmon} = “md_mon.log”;
$UXMON_MODULE_LOGFILE{bondmon} = “bond_mon.log”;
$UXMON_MODULE_LOGFILE{mpmon} = “mp_mon.log”;
$UXMON_MODULE_LOGFILE{swapmon} = “swap_mon.log”;

#########################################################
# The complete LOGPATH
foreach $k (@UXMON_LIST_MODULES)
{
$UXMON_MODULE_LOGPATH{$k} = catfile($UXMON_OVO_LOG,$UXMON_MODULE_LOGFILE{$k});
}

################################################################
# The config file
$UXMON_MODULE_CFGFILE{dfmon} = “df_mon.cfg”;
$UXMON_MODULE_CFGFILE{psmon} = “ps_mon.cfg”;
$UXMON_MODULE_CFGFILE{actmon} = “act_mon.cfg”;
$UXMON_MODULE_CFGFILE{volmon} = “vol_mon.cfg”;
$UXMON_MODULE_CFGFILE{cronmon} = “cron_mon.cfg”;
$UXMON_MODULE_CFGFILE{ntpmon} = “ntp_mon.cfg”;
$UXMON_MODULE_CFGFILE{lpmon} = “lp_mon.cfg”;
$UXMON_MODULE_CFGFILE{nfsmon} = “nfs_mon.cfg”;
$UXMON_MODULE_CFGFILE{perfmon} = “perf_mon.cfg”;
$UXMON_MODULE_CFGFILE{ktsmon} = “kts_mon.cfg”;
$UXMON_MODULE_CFGFILE{sshdmon} = “sshd_mon.cfg”;
$UXMON_MODULE_CFGFILE{evm} = “evm_mon.cfg”;
$UXMON_MODULE_CFGFILE{rcmon} = “rc_mon.cfg”;
$UXMON_MODULE_CFGFILE{loopmon} = “loop_mon.cfg”;
$UXMON_MODULE_CFGFILE{uxmon} = “uxmon.cfg”;
$UXMON_MODULE_CFGFILE{advfsmon} = “advfs_mon.cfg”;
$UXMON_MODULE_CFGFILE{dmesg} = “dmsg_mon.cfg”;
$UXMON_MODULE_CFGFILE{bootmon} = “boot_mon.cfg”;
$UXMON_MODULE_CFGFILE{scmon} = “sc_mon.cfg”;
$UXMON_MODULE_CFGFILE{sgmon} = “sg_mon.cfg”;
$UXMON_MODULE_CFGFILE{vcmon} = “vc_mon.cfg”;
$UXMON_MODULE_CFGFILE{selfcheck} = “”;
$UXMON_MODULE_CFGFILE{nicmon} = “nic_mon.cfg”;
$UXMON_MODULE_CFGFILE{hwmon} = “hw_mon.cfg”;
$UXMON_MODULE_CFGFILE{mdmon} = “md_mon.cfg”;
$UXMON_MODULE_CFGFILE{bondmon} = “bond_mon.cfg”;
$UXMON_MODULE_CFGFILE{mpmon} = “mp_mon.cfg”;
$UXMON_MODULE_CFGFILE{swapmon} = “swap_mon.cfg”;

#########################################################
# The complete CFG PATH
foreach $k (@UXMON_LIST_MODULES)
{
$UXMON_MODULE_CFGPATH{$k} = catfile($UXMON_OVO_CFG,$UXMON_MODULE_CFGFILE{$k});
}

###############################################################################

#########################
# this line is mandatory
return 1;

UXMONmetrics.cfg

#!/usr/bin/perl
##################################################################################
#@(#) $Id: UXMONmetrics.cfg-linux 2132 2014-08-22 06:47:32Z zhaofeif $
#@(#) $Rev: 2132 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2014-08-22 14:47:32 +0800 (Fri, 22 Aug 2014) $
#@(#) $LastChangedBy: zhaofeif $
##################################################################################
#
# Resource Monitoring configuration file
#__________________________________________
#
#
# This is the configuration file where you can “build” the metrics to be used in the rc_mon.cfg
# in other words, each metric you use in rc_mon.cfg MUST HAVE its definition here if you want
# to use it, otherwise such line in the rc_mon.cfg will be ignored.
#
# The syntax to create a METRIC is the next one:
# SYNTAX
#————————-
# file:= categories
# | category categories
#
# category:= defcategory metrics closecategory
# metrics:= metric
# | metric metrics
# metric:= metricname command unnits anno_cmd closemetric
# metricname:=
# command:= command = TEXT
# units:= units = UNITS
# anno_cmd:= anno_cmd = TEXT
# closemtric:=
# closecategory:= end object
#
# Basically the idea is to define several categories with MetricCategory line. Within each Category the Metric name MUST BE UNIQUE to avoid
# collisions. Within each category you can define at least one metric.
# Each metric consist in a name, a command to be executed, a units definition an annotation command.
# The command to be exucted is the metric itself, and this command must extract the info precisely in
# a well defined format. See below Format of Metrics. In this way , the UXMONrcmon will execute the lines you define here
# and the output of such command will be parsed to find out if the thresholds set in rc_mon.cfg has been exceeded.
# The units is useful to define the format of what you are working with
#
# OUTUT FORMAT OF COMMAND
#—————————-
# The command metrics must return by standard output a numeric value for such metric, decimal dots are allowed but not text. Only numbers.
# Note that in case you define a metric that takes info from several instances (see rc_mon.cfg) then the output must be one line per instance:
# INSTANCE1 VALUE
# INSTANCE2 VALUE
# ..and so on
# for example
# /var/opt 15
# /var/tmp 20
# …

##
## Some examples
##——————–
#MetricCategory = diskspace
#
#command = df -m|awk ‘{ if ($1 !~/Filesystem/ && $3>=0) { printf(“%s %.2f\n”, $NF,$(NF-3));} }’
#units = MB
#anno_cmd = df -m
#
#end object

##MetricCategory = Processor
##
## command = vmstat 2 4 | awk ‘ { getline;getline;getline;getline;getline; printf (“CPU %d\n”,$13); }’
## anno_cmd = vmstat 2 4
##
##end object
#
##
## Metric definition for kernel parameters monitoring`
##
#MetricCategory = kernel
#
#command = tail -1 /proc/sys/fs/file-nr | /bin/awk ‘{printf(“NF %.2f\n”,$2*100/$3);}’
#units = %
#anno_cmd = tail -1 /proc/sys/fs/file-nr
#
#end object
#
##
## Metric definition for user quota monitoring on all filesystem
##
#MetricCategory = quota
#
##command = repquota -v | awk ‘{ if ($4 ~ /[[:digit:]+]/) {printf(“%s %.2f\n”, $1, 100*$3/$4);} }’
#command = repquota -v -a | awk ‘{ if ($4 ~ /[[:digit:]+]/) {printf(“%s %.2f\n”, $1, 100*$3/$4);} }’
#units = %
#anno_cmd = repquota -v -a
#
#end object
#
##
## Metric definition for swap monitoring (total swap space used is monitored)
##
#MetricCategory = swap
#
#command = /usr/bin/free -b | grep Swap | awk ‘{printf(“total %.0f\n”, $3*100/$2);}’
#units = %
#anno_cmd = /usr/bin/free -b
#
#end object

UXMONperf.cfg

###############################################################################
#@(#) $Id: UXMONperf.cfg 2178 2015-05-08 03:08:02Z zhaofeif $
#@(#) $Rev: 2178 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-05-08 11:08:02 +0800 (Fri, 08 May 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

ALARM GBL_SWAP_SPACE_UTIL > 90 FOR 10 MINUTES
START EXEC “echo ‘PERFMON: critical Message: GBL_SWAP_SPACE_USED_UTIL total exceeds 90%’ >> /var/opt/OV/log/OpC/perf_mon.log”
REPEAT EVERY 15 MINUTES
EXEC “echo ‘PERFMON: critical Message: GBL_SWAP_SPACE_USED_UTIL total exceeds 90%’ >> /var/opt/OV/log/OpC/perf_mon.log”
END EXEC ” echo ‘GBL_SWAP_SPACE_USED_UTIL > 90% ENDED’ >> /var/opt/OV/log/OpC/perf_mon.log”

ALARM GBL_CPU_TOTAL_UTIL > 98 FOR 5 MINUTES
START EXEC “echo ‘PERFMON: critical Message: GBL_CPU_TOTAL_UTIL total exceeds threshold 98%’ >> /var/opt/OV/log/OpC/perf_mon.log”
REPEAT EVERY 15 MINUTES
EXEC “echo ‘PERFMON: critical Message: GBL_CPU_TOTAL_UTIL total exceeds threshold 98%’ >> /var/opt/OV/log/OpC/perf_mon.log”
END EXEC ” echo ‘GBL_CPU_SYS_MODE_UTIL > 98% ENDED’ >> /var/opt/OV/log/OpC/perf_mon.log”

ALARM GBL_CPU_TOTAL_UTIL > 85 FOR 15 MINUTES
START EXEC “echo ‘PERFMON: warning Message: GBL_CPU_TOTAL_UTIL total exceeds threshold 85%’ >> /var/opt/OV/log/OpC/perf_mon.log”
REPEAT EVERY 30 MINUTES
EXEC “echo ‘PERFMON: warning Message: GBL_CPU_TOTAL_UTIL total exceeds threshold 85%’ >> /var/opt/OV/log/OpC/perf_mon.log”
END EXEC ” echo ‘GBL_CPU_SYS_MODE_UTIL > 85% ENDED’ >> /var/opt/OV/log/OpC/perf_mon.log”

# ALM 18812, enhancement for memory
ALARM GBL_MEM_UTIL > 98 FOR 5 MINUTES
START EXEC “echo ‘PERFMON: critical Message: GBL_MEM_UTIL total exceeds threshold 98%’ >> /var/opt/OV/log/OpC/perf_mon.log”
REPEAT EVERY 15 MINUTES
EXEC “echo ‘PERFMON: critical Message: GBL_MEM_UTIL total exceeds threshold 98%’ >> /var/opt/OV/log/OpC/perf_mon.log”
END EXEC ” echo ‘GBL_MEM_UTIL > 98% ENDED’ >> /var/opt/OV/log/OpC/perf_mon.log”

ALARM GBL_MEM_UTIL > 85 FOR 15 MINUTES
START EXEC “echo ‘PERFMON: warning Message: GBL_MEM_UTIL total exceeds threshold 85%’ >> /var/opt/OV/log/OpC/perf_mon.log”
REPEAT EVERY 30 MINUTES
EXEC “echo ‘PERFMON: warning Message: GBL_MEM_UTIL total exceeds threshold 85%’ >> /var/opt/OV/log/OpC/perf_mon.log”
END EXEC ” echo ‘GBL_MEM_UTIL > 85% ENDED’ >> /var/opt/OV/log/OpC/perf_mon.log”

ALARM GBL_MEM_PAGEOUT_RATE > 98 FOR 5 MINUTES
START EXEC “echo ‘PERFMON: critical Message: GBL_MEM_PAGEOUT_RATE total exceeds threshold 98%’ >> /var/opt/OV/log/OpC/perf_mon.log”
REPEAT EVERY 15 MINUTES
EXEC “echo ‘PERFMON: critical Message: GBL_MEM_PAGEOUT_RATE total exceeds threshold 98%’ >> /var/opt/OV/log/OpC/perf_mon.log”
END EXEC ” echo ‘GBL_MEM_PAGEOUT_RATE > 98% ENDED’ >> /var/opt/OV/log/OpC/perf_mon.log”

ALARM GBL_MEM_PAGEOUT_RATE > 85 FOR 15 MINUTES
START EXEC “echo ‘PERFMON: warning Message: GBL_MEM_PAGEOUT_RATE total exceeds threshold 85%’ >> /var/opt/OV/log/OpC/perf_mon.log”
REPEAT EVERY 30 MINUTES
EXEC “echo ‘PERFMON: warning Message: GBL_MEM_PAGEOUT_RATE total exceeds threshold 85%’ >> /var/opt/OV/log/OpC/perf_mon.log”
END EXEC ” echo ‘GBL_MEM_PAGEOUT_RATE > 85% ENDED’ >> /var/opt/OV/log/OpC/perf_mon.log”

uxmon_selfcheck.cfg

###############################################################################
#@ $Id: uxmon_selfcheck.cfg 2152 2015-03-03 09:11:56Z zhaofeif $
#@ $Rev: 2152 $
#@ $Author: zhaofeif $
#@ $Date: 2015-03-03 17:11:56 +0800 (Tue, 03 Mar 2015) $
#@ $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: uxmon_selfcheck.cfg
#
#############################################################################
#
# Syntax:
# [REARM = TRUE|FALSE]
#==============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# solution = uxmon
# [check_enable = yes|no]
# Filename threshold severity schedule group
# ===========================================================================
# /dir/file1.log 15m WARNING 0000-2400 * MYGROUP
# /dir/file2.log 1d CRITICAL 0000-2400 * OS
#
#
# check_enable
#===============
# If set check_enable to YES (or yes), selfcheck will check all the log files. otherwise it will ignore all log files
#
# DEFAULTS
#===============
# If Severity not specified is assumed warning.
# If threshold is not qualified is assumed s (seconds)
# If GROUP not specified is assumed the group NONE
# If no qualifier is explicited (> or ” is assumed
#
# THRESHOLD|MISSING
#================
#
# The INTERVAL can be given in different units.Possible units for file age are:
# (no unit) => seconds
# s => seconds
# m => minutes
# h => hours
# d => days
# CMA SUPPORT
#========================================
# You can access to the EventType and EventTypeInstance using teh [ ] brackets statements
# to set which values are to be used by the following lines
# The syntax of such line is : [ OBJECT , EventType, EventTypeInstance ]
# And the Default values for all cases are NONE
# You can state only the object : [ OBJECT ]
# You can state object and EventType leave EventTypeInstace to default value [OBJECT ,EventType ]
# But you can never use more than three fields or less than 1
# Fields to be separated by commands and blank spaces not allowed
# [ OS, OS, SapApp ]

solution = uxmon
check_enable = no
/var/opt/OV/log/OpC/df_mon.log 1d warning 0000-2400 * OS
/var/opt/OV/log/OpC/act_mon.log 1d warning 0000-2400 * OS
/var/opt/OV/log/OpC/ps_mon.log 1d warning 0000-2400 * OS

#############################################################################
# end of uxmon_selfcheck.cfg
#############################################################################

uxmonsyslog.cfg

###############################################################################
#@ $Id: uxmon_selfcheck.cfg 2152 2015-03-03 09:11:56Z zhaofeif $
#@ $Rev: 2152 $
#@ $Author: zhaofeif $
#@ $Date: 2015-03-03 17:11:56 +0800 (Tue, 03 Mar 2015) $
#@ $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: uxmon_selfcheck.cfg
#
#############################################################################
#
# Syntax:
# [REARM = TRUE|FALSE]
#==============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# solution = uxmon
# [check_enable = yes|no]
# Filename threshold severity schedule group
# ===========================================================================
# /dir/file1.log 15m WARNING 0000-2400 * MYGROUP
# /dir/file2.log 1d CRITICAL 0000-2400 * OS
#
#
# check_enable
#===============
# If set check_enable to YES (or yes), selfcheck will check all the log files. otherwise it will ignore all log files
#
# DEFAULTS
#===============
# If Severity not specified is assumed warning.
# If threshold is not qualified is assumed s (seconds)
# If GROUP not specified is assumed the group NONE
# If no qualifier is explicited (> or ” is assumed
#
# THRESHOLD|MISSING
#================
#
# The INTERVAL can be given in different units.Possible units for file age are:
# (no unit) => seconds
# s => seconds
# m => minutes
# h => hours
# d => days
# CMA SUPPORT
#========================================
# You can access to the EventType and EventTypeInstance using teh [ ] brackets statements
# to set which values are to be used by the following lines
# The syntax of such line is : [ OBJECT , EventType, EventTypeInstance ]
# And the Default values for all cases are NONE
# You can state only the object : [ OBJECT ]
# You can state object and EventType leave EventTypeInstace to default value [OBJECT ,EventType ]
# But you can never use more than three fields or less than 1
# Fields to be separated by commands and blank spaces not allowed
# [ OS, OS, SapApp ]

solution = uxmon
check_enable = no
/var/opt/OV/log/OpC/df_mon.log 1d warning 0000-2400 * OS
/var/opt/OV/log/OpC/act_mon.log 1d warning 0000-2400 * OS
/var/opt/OV/log/OpC/ps_mon.log 1d warning 0000-2400 * OS

#############################################################################
# end of uxmon_selfcheck.cfg
#############################################################################

root@cavaas05:/var/opt/OV/bin/instrumentation # cat uxmonsyslog.cfg
#!/usr/bin/perl
###############################################################################
#@(#) $Id: uxmonsyslog.cfg 2214 2015-06-09 02:20:55Z zhaofeif $
#@(#) $Rev: 2214 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-06-09 10:20:55 +0800 (Tue, 09 Jun 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
#
# REARM
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes

####################
DEFAULT_SEVERITY=WARNING
#Define default severity, can be WARNING, MINOR, MAJOR, CRITICAL
#if not defined, default severity will be WARNING

################
# ErrorClassList
# [WARNING|MINOR|MAJOR|CRITICAL] If not define, use default severity
# [H|S|O|U]:[PERM|TEMP|PERF|PEND|UNKN|INFO] can define error_class:error_type
# H -> Hardware S -> Software O -> errlogger command messages U -> Unknown
# It specfies the classes of errors that can generate an ITO message
#

ERRORCLASSLIST = H
#ERRORCLASSLIST = CRITICAL H:PERM, H:TEMP, S
#ERRORCLASSLIST = S,O,U

##################
# Identifier filter
# You can specify identifier filters, that will cause such events
# be ignored
# See errpt -t for full list
# you can set as many lines ID_FILTER as you wish, if you provide more than one ID in each line
# separate them with comma
# ID_FILTER = AA8AB241, AA8AB242
# ID_FILTER = AA8AB423

###################
# Force Include
# You can force to include specific Identifiers even when such
# belong to a class list that is not explicitely inclyded in ERRORCLASSLIST
# The syntax is like ID_FILTER
# ID_INCLUDE = [WARNING|MINOR|MAJOR|CRITICAL] AA8AB241
# If severity not define, use default severity

#ID_INCLUDE = MAJOR BFE4C025
###################
#AUTOACTION=ERROR_ID::comand
#If the command returns error it will be reported, if not it exits silently
#

vc_mon.cfg

###############################################################################
#@ $Id: vc_mon.cfg 2162 2015-03-19 08:40:05Z zhaofeif $
#@ $Rev: 2162 $
#@ $Author: zhaofeif $
#@ $Date: 2015-03-19 16:40:05 +0800 (Thu, 19 Mar 2015) $
#@ $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: vc_mon.cfg
#
#############################################################################
#
# configuration for VC monitoring
#
# Description of parameters
# —————————————————————————
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
# [msg_group = ]
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# msg_group
#===============
# msg_group is the alert type you can configure which alarm will be raised. Possible
# values are:
# B -> Browser
# N -> Browser+Notification
# TT -> Browser+Trouble Ticket
# NT -> Browser+Notification+Trouble Ticket
# V -> for ACF

# RG[0]=XYZ Package name 1
# RG_NODE[0]=ABC Primary node on which the package must run
# Define as * will disable running on adoptive node check
# Define as *ALL will treat as parallel resource group
# RG_SWTCH[0]=1 Set to 1 if Package_switching should be ENABLED
# Set to 0 if Package_switching must not be ENABLED
#
# numbers in [] represent a package
#
#
#############################################################################
# example configuration:
# REARM = TRUE
# msg_group = V_ACF
# RG[0]=vc_pack_one; RG_NODE[0]=vc_node_one; RG_SWTCH[0]=1
# RG[1]=vc_pack_two; RG_NODE[1]=vc_node_two; RG_SWTCH[1]=1
# RG[2]=vc_pack_three; RG_NODE[2]=*; RG_SWTCH[2]=1
# RG[3]=vc_pack_three; RG_NODE[3]=*ALL; RG_SWTCH[3]=1
#############################################################################
# end of vc_mon.cfg
#############################################################################

vol_mon.cfg

#############################################################################
#@ $Id: vol_mon.cfg 2214 2015-06-09 02:20:55Z zhaofeif $
#@ $Rev: 2214 $
#@ $Author: zhaofeif $
#@ $Date: 2015-06-09 10:20:55 +0800 (Tue, 09 Jun 2015) $
#@ $LastChangedBy: zhaofeif $
##############################################################################
#
# Syntax:
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
# [Disable_Lv_No_Check = yes|no]
# [exclude_lv_no_check ]
#
# [*PACKAGE pkgname]
# [*IPISLOCAL ]
# or
# exclude []
# = hhmm-hhmm []
# = n[,] | * (is also posible a range like 1-5)
# where n represents day of a week starting with
# Sunday=0 and Saturday=6;
# * means all days
# (if you use 1,* or 3,*,4 alike is not allowed)
# exclude_from_stale_extend_check
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# Disable_Lv_No_Check
#===============
# If set disable to YES (or yes), will disable Open Lv and Cur Lv equal number check
# The “exclude_lv_no_check ” syntax is used to disable Open Lv and Cur Lv equal number check of specific volume group.
#
#
#
#======================
# This configuration file for volume monitor is optional. It may be used
# to specify file systems which should be mounted but don’t appear in the
# file /etc/fstab (for HP-UX and Linux)(e.g. ServiceGuard file systems) or /etc/vfstab
# (for Solaris) and /usr/sbin/lsfs (for Aix). The configuration file
# consists of one-line entries each specifying a separate filesystem. In addition, volmon
# not only monitors if the file system is mounted to the specified mount point, but also
# checks the status of logical and physical volumes.
#
# Blank lines and comment lines beginning with a “#” are ignored. Also,
# extra fields after the filesystem entry on the same line are ignored.
# This is useful for specifying the disk space monitoring configuration file
# (df_mon.cfg) as the configuration file for volume monitor.
#
# *PACKAGE. Cluster Awareness
#=============================
# There is another directive , *PACKAGE pkgname, that can be used to inform to VOLMON that
# this resource is managed by a Cluster, and more specifically included in the package named
# pkgname
# The VOLMON , prior to test that file system, it will check if the cluster package is in fact
# running in this local node, if yes, it will process it as any other file system. If it’s not
# running in this node (therefore is running in another one), it will ignore this filesystem
# about the mounted check
#
# IPISLOCAL. IP address or FQHN
#=============================
# Check if IP address or FQHN is assigned to monitored node and if a corresponding interface is UP
#
# exclude
#========================
# The “exclude ” syntax is used to force entries in /etc/fstab (for instance in HP-UX)
# to be ignored. /etc/fstab is intended for listing permanent filesystems
# only, but in some cases temporary filesystems (not usually mounted) are
# also listed. In this case, the temporary filesystems should be listed
# here with the “exclude” option so that vol_mon will not report on them.
#
# exclude_from_stale_extend_check
#======================================================
# The “exclude_from_stale_extend_check ” syntax is used to
# ignore logical file system’s stale extend check. Once you configured the exist Logical file system

############################################################################
# For example:
# REARM = TRUE
# disable this module
# disable = yes

# the volmon will execute every 5 minutes.
# interval = 5

# exclude_from_stale_extend_check /dev/vg00/lvol1
# exclude_from_stale_extend_check /dev/vg00/lvol2

# then volmon will do not check /dev/vg00/lvol1 and /dev/vg00/lvol1 about stale extend,
# certainly there will have no alarm report about /dev/vg00/lvol1 and /dev/vg00/lvol1 stale extend.

#############################################################################
# end of vol_mon.cfg
############################################################################

UXMON: Cluster lock device not up: node02: /dev/disk/disk182(STATUS:unknown) check with cmviewcl

Node : node01.setaoffice.com
Node Type : Itanium 64/32(HTTPS)
Severity : major
OM Server Time: 2015-10-18 05:39:52
Message : UXMON: Cluster lock device not up: node02: /dev/disk/disk182(STATUS:unknown) check with cmviewcl -v -l node
Msg Group : OS
Application : sgmon
Object : cmviewcl
Event Type :
not_found

Instance Name :
not_found

Instruction : check with cmviewcl -v -l node ;

Do not close this case before it is resolved.
As long as this EWM-case is not resolved or closed, monitoring is disabled

Check HP Service Guard status

root@node01:/root # cmviewcl -v

CLUSTER STATUS
cluster_hpux up

NODE STATUS STATE
node01 up running

Cluster_Lock_LVM:
VOLUME_GROUP PHYSICAL_VOLUME STATUS
/dev/vglock /dev/disk/disk75 up

Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up LinkAgg0 lan900
PRIMARY up LinkAgg1 lan901
PRIMARY up LinkAgg2 lan902
PRIMARY up LinkAgg3 lan903

PACKAGE STATUS STATE AUTO_RUN NODE
dbciSMP up running enabled vlunx014

Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual

Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Subnet up 142.40.81.0

Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled node01 (current)
Alternate unknown node02

Other_Attributes:
ATTRIBUTE_NAME ATTRIBUTE_VALUE
Style legacy
Priority no_priority

NODE STATUS STATE
node02 down halted

Cluster_Lock_LVM:
VOLUME_GROUP PHYSICAL_VOLUME STATUS
/dev/vglock /dev/disk/disk182 unknown

Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY unknown LinkAgg0 lan900
PRIMARY unknown LinkAgg1 lan901
PRIMARY unknown LinkAgg2 lan902
PRIMARY unknown LinkAgg3 lan903

In our case, we decomissioned a cluster node so node02 is always down. We added a configuration to HPOM to disable the cluster lock check

node01:/var/opt/OV/bin/instrumentation# cp sg_mon.cfg /var/opt/OV/conf/OpC
vi /var/opt/OV/conf/OpC/sg_mon.cfg
DISABLE_CLUSTER_LOCK_CHECK = yes

UXMON: mpathb – Only one path detected, no path redundancy

Also see:
UXMON: volumegroup – Only one path detected, no path redundancy
UXMON: SY1_log2_disk_001 – Only one path detected, no path redundancy

Node : linux.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : major
OM Server Time: 2015-10-14 12:39:19
Message : UXMON: mpathb – Only one path detected, no path redundancy
Msg Group : OS
Application : mpmon
Object : mp
Event Type :
not_found

Instance Name :
not_found

Instruction : The multipathd -k”show map $device topology” command shows more details

Please check /var/opt/OV/log/OpC/mp_mon.log for more details

Checking the log file it complains about the mpathb

root@linux:~ # cat /var/opt/OV/log/OpC/mp_mon.log
Wed Oct 14 13:39:13 2015 : INFO : UXMONmpmon is running now, pid=21954
Wed Oct 14 13:39:13 2015 : Major: mpathb – Only one path detected, no path redundancy
Wed Oct 14 13:39:13 2015 : INFO : UXMONmpmon end, pid=21954
Wed Oct 14 13:56:12 2015 : INFO : UXMONmpmon is running now, pid=29130
Wed Oct 14 13:56:12 2015 : Major: mpathb – Only one path detected, no path redundancy
Wed Oct 14 13:56:12 2015 : INFO : UXMONmpmon end, pid=29130
Wed Oct 14 14:13:13 2015 : INFO : UXMONmpmon is running now, pid=36813
Wed Oct 14 14:13:13 2015 : Major: mpathb – Only one path detected, no path redundancy
Wed Oct 14 14:13:13 2015 : INFO : UXMONmpmon end, pid=36813
Wed Oct 14 14:30:13 2015 : INFO : UXMONmpmon is running now, pid=44029
Wed Oct 14 14:30:13 2015 : Major: mpathb – Only one path detected, no path redundancy
Wed Oct 14 14:30:13 2015 : INFO : UXMONmpmon end, pid=44029
Wed Oct 14 14:47:12 2015 : INFO : UXMONmpmon is running now, pid=51897
Wed Oct 14 14:47:13 2015 : INFO : UXMONmpmon end, pid=51897
Wed Oct 14 15:04:12 2015 : INFO : UXMONmpmon is running now, pid=58833
Wed Oct 14 15:04:12 2015 : INFO : UXMONmpmon end, pid=58833

In this server it is a local disk so it was added to the multipath blacklist

root@linux:~ # vi /etc/multipath.conf
blacklist {
devnode “^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*”
devnode “^hd[a-z]”
devnode “^sd[ab]$”
devnode “^cciss!c[0-9]d[0-9]*”
}

If you are in a VMware host, you can safely disable this module.

root@linux:~ # cp /var/opt/OV/bin/instrumentation/mp_mon.cfg /var/opt/OV/conf/OpC/

In the configuration file /var/opt/OV/conf/OpC/mp_mon.cfg set disable to yes

root@linux:~ # vi /var/opt/OV/conf/OpC/mp_mon.cfg
disable = yes

HPOM UXMON: kernel table THRESH_NP is over threshold 80%. Use sar or other available tool tocheck out.

Node : hpux.setaoffice.com
Node Type : Sun SPARC (HTTPS)
Severity : warning
OM Server Time: 2015-10-06 14:24:46
Message : UXMON: kernel table THRESH_NP is over threshold 80%. Use sar or other available tool tocheck out.
Msg Group : OS
Application : ktsmon
Object : THRESH_NP
Event Type :
not_found

Instance Name :
not_found

Instruction : A kernel table is close to be exhausted, this might impact
in the system. Please, check this and take actions if needed.

If the threshold set is too low then increase it in the kts_mon.cfg file

There were several scripts that were being executed by the user application

root@hpux:/ # ps -ef | grep respawn | wc -l
8142

I killed several of them and the load decreased.

root@hpux:/ # uptime
9:33am up 101 day(s), 10:58, 2 users, load average: 2062.32, 3526.27, 3627.53

root@hpux:/ # uptime
9:36am up 101 day(s), 11:01, 2 users, load average: 71.39, 1788.84, 2893.45

root@hpux:/ # uptime
9:38am up 101 day(s), 11:03, 3 users, load average: 13.54, 1261.00, 2575.33

root@hpux:/ # uptime
9:46am up 101 day(s), 11:11, 4 users, load average: 0.63, 254.14, 1509.83

Asked to restart the application because I didn’t know if it killed some important script or not

HPOM – EXT4-fs: (dm-9): barriers disabled

Node: linux.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : minor
OM Server Time: 2015-09-28 08:42:14
Message : EXT4-fs: (dm-9): barriers disabled
Msg Group : OS
Application : dmsg_mon
Object : EXT4
Event Type :
not_found

Instance Name :
not_found

Instruction : No

I mounted a new filesystem that I just set up with nobarrier parameter in /etc/fstab. It logged the following message on /var/log/messages

root@linux:~ # grep barrier /var/log/messages
Sep 28 09:39:11 linux kernel: EXT4-fs (dm-9): barriers disabled

RHEL 6 Storage Administration Guide – 22.2. Enabling/Disabling Write Barriers

I removed all the references to nobarrier parameter

root@linux:~ # cat /etc/fstab
#
# /etc/fstab
# Created by anaconda on Mon Jun 24 13:57:59 2013
#
# Accessible filesystems, by reference, are maintained under ‘/dev/disk’
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/rootvg-rootlv / ext4 defaults,nobarrier 1 1
/dev/mapper/rootvg-auditlv /audit ext4 defaults,nobarrier 1 2
UUID=4fdc0716-230e-4c5d-a3fe-ba2196bf6c21 /boot ext3 defaults 1 2
/dev/mapper/rootvg-optlv /opt ext4 defaults,nobarrier 1 2
/dev/mapper/rootvg-tmplv /tmp ext4 defaults,nobarrier 1 2
/dev/mapper/rootvg-usrlv /usr ext4 defaults,nobarrier 1 2
/dev/mapper/rootvg-userslv /usr/users ext4 defaults,nobarrier 1 2
/dev/mapper/rootvg-varlv /var ext4 defaults,nobarrier 1 2
/dev/mapper/rootvg-crashlv /var/crash ext3 defaults,nobarrier 1 2
/dev/mapper/rootvg-swaplv swap swap defaults 0 0
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/mapper/rootvg-repolv /repo ext4 defaults,nobarrier 1 2

Failure: The event flow is broken on solaris.setaoffice.com for the last 60min. Please follow the instructions.

ATTENTION, RMC LEVEL 1 AGENT: This ticket will be automatically worked by the Automation Bus. Pls do not take ownership until further notice.
Node : solaris.setaoffice.com
Node Type : Sun SPARC (HTTPS)
Severity : major
OM Server Time: 2015-04-01 14:37:24
Message : Failure: The event flow is broken on solaris.setaoffice.com for the last 60min. Please follow the instructions.
Msg Group : ITO
Application : HealthCheck
Object : OVO-agent
Event Type :
not_found

Instance Name :
not_found

Instruction : (Please carry out instructions in order and record output in ticket)

1) check if there is any maintenance ongoing for the respective system. Set an scheduled outage if yes.

2) check if the system is reachable – login to the server in question and ping the OVO management server. if not pingable, inform the second line or technical lead

3) if the system is reachable, generate a test alert on the node in question.

4) if the test alert is not received, do opcagt -kill; then remove temp queue files (/var/opt/OV/tmp/OpC/*q on Unix or on windows,
…\tmp\OpC\*q); then do opcagt -start on the system. Generate another test alert on the node in question.

5) if the the test alert is not received, refer the call to OVO monitoring support team.

Check which host is the HPOM manager and try to ping it

root@solaris:/ # /opt/OV/bin/ovconfget | grep OPC_PRIMARY_MGR
OPC_PRIMARY_MGR=hpommanager.omc.hp.com

root@solaris:/ # ping hpommanager.omc.hp.com
hpommanager.omc.hp.com is alive

Try also to use the tool bbcutil and check the status. If everything is also okay, the manager is having trouble reaching the managed host.

root@solaris:/ # bbcutil -ping https://hpommanager.omc.hp.com

https://hpommanager.omc.hp.com: status=eServiceOK
coreID=d2ebdec9-48ff-40ec-bf76-eb233981c3a0
bbcV=11.14.014 appN=ovbbccb appV=unknown version
conn=9 time=1199 ms

HPOM – SOL_mon is warning about a network interface

UXMON: genunix: [ID 408789 kern.warning] WARNING: ce1: fault detected external to device; service degraded

Node : solaris10_node1.setaoffice.com
Node Type : Sun SPARC (HTTPS)
Severity : major
OM Server Time: 2015-06-23 09:22:58
Message : UXMON: genunix: [ID 408789 kern.warning] WARNING: ce1: fault detected external to device;

service degraded
Msg Group : OS
Application : SOL_mon
Object : hardware
Event Type :
not_found

Instance Name :
not_found

Instruction : “The Fault Management agent has identified a HW

or OS related problem with the severity presented by the ticket.
The problem(s) can be viewed and managed with the command – fmdump
To get a better understanding of the problem and on how to resolve it, locate the event that generated
the ticket in the syslog file /var/adm/messages, a URL will be found (http://sun.com/msg/xxx-nnnn-yy),
follow the link using your Oracle portal account for instructions.”

Check /var/adm/messages for mentions about this network interface

root@solaris10_node1:/ # grep ce1 /var/adm/messages
Jun 23 10:20:59 solaris10_node1 genunix: [ID 408789 kern.warning] WARNING: ce1: fault detected external to device; service degraded
Jun 23 10:20:59 solaris10_node1 genunix: [ID 451854 kern.warning] WARNING: ce1: xcvr addr:0x01 – link down

Also check dmesg

root@solaris10_node1:/ # dmesg | grep ce1
Jun 22 14:26:02 solaris10_node1 genunix: [ID 408789 kern.warning] WARNING: ce1: fault detected external to device; service degraded
Jun 22 14:26:02 solaris10_node1 genunix: [ID 451854 kern.warning] WARNING: ce1: xcvr addr:0x01 – link down
Jun 22 14:26:05 solaris10_node1 genunix: [ID 408789 kern.notice] NOTICE: ce1: fault cleared external to device; service available
Jun 22 14:26:05 solaris10_node1 genunix: [ID 451854 kern.notice] NOTICE: ce1: xcvr addr:0x01 – link up 10 Mbps half duplex
Jun 22 14:26:11 solaris10_node1 genunix: [ID 408789 kern.warning] WARNING: ce1: fault detected external to device; service degraded
Jun 22 14:26:11 solaris10_node1 genunix: [ID 451854 kern.warning] WARNING: ce1: xcvr addr:0x01 – link down
Jun 22 14:26:12 solaris10_node1 genunix: [ID 408789 kern.notice] NOTICE: ce1: fault cleared external to device; service available
Jun 22 14:26:12 solaris10_node1 genunix: [ID 451854 kern.notice] NOTICE: ce1: xcvr addr:0x01 – link up 10 Mbps half duplex
Jun 23 10:20:59 solaris10_node1 genunix: [ID 408789 kern.warning] WARNING: ce1: fault detected external to device; service degraded
Jun 23 10:20:59 solaris10_node1 genunix: [ID 451854 kern.warning] WARNING: ce1: xcvr addr:0x01 – link down

If you don’t need monitoring, configure /var/opt/OV/conf/OpC/dmsg_mon.cfg to suppress this alarm.

UXMON: NTP Problems. Running on local time. Peer: 127.127.1.0 Cur. Offset: 0.000 Cur. Symbol: * Ref. ID: .LOCL.

Node : linux.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : minor
OM Server Time: 2015-03-24 19:54:33
Message : UXMON: NTP Problems. Running on local time. Peer: 127.127.1.0 Cur. Offset: 0.000 Cur. Symbol: * Ref. ID: .LOCL.
Msg Group : OS
Application : ntpmon
Object : ntpq
Event Type :
not_found

Instance Name :
not_found

Instruction : This message shows no valid peers

Please, contact with your UX expert

Please check /var/opt/OV/log/OpC/ntp_mon.log for more details

Checking /var/opt/OV/log/OpC/ntp_mon.log shows the following message:

Mon Mar 30 09:04:47 2015 : NTP Problems. Running on local time. Peer: 127.127.1.0 Cur. Offset: 0.000 Cur. Symbol: * Ref. ID: .LOCL.

Edit file /etc/ntp.conf to not use the local clock

root@linux:~ # vi /etc/ntp.conf
# Undisciplined Local Clock. This is a fake driver intended for backup
# and when no outside source of synchronized time is available.
#server 127.127.1.0 # local clock
#fudge 127.127.1.0 stratum 10

UXMON: NTP Problems. Running on local time. Peer: 127.127.1.0 Cur. Offset: 0.000 Cur.

Having this problem alerted on a Linux system

Node : linux.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : minor
OM Server Time: 2015-03-24 19:54:33
Message : UXMON: NTP Problems. Running on local time. Peer: 127.127.1.0 Cur. Offset: 0.000 Cur. Symbol: * Ref. ID: .LOCL.
Msg Group : OS
Application : ntpmon
Object : ntpq
Event Type :
not_found

Instance Name :
not_found

Instruction : This message shows no valid peers

Please, contact with your UX expert

Please check /var/opt/OV/log/OpC/ntp_mon.log for more details

Verifying the log file I noticed the following

root@linux:~ # tail -50 /var/opt/OV/log/OpC/ntp_mon.log
NTP Problems. Running on local time. Peer: 127.127.1.0 Cur. Offset: 0.000 Cur. Symbol: * Ref. ID: .LOCL.”

Comment the line that is saying it is using the local clock on file /etc/ntp.conf

#server 127.127.1.0 # local clock

It is using the fake driver even if it is configured to use 4 NTP servers configured

root@linux:~ # ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
*LOCAL(0) .LOCL. 10 l 38 64 377 0.000 0.000 0.001
NTP1server 172.16.73.51 4 u 5 16 1 31.138 -15.569 0.001
NTP2server 172.16.73.9 3 u 4 16 1 26.626 -15.001 0.001
NTP3server 172.16.73.50 5 u 3 16 1 23.416 -16.803 0.001
NTP4server 172.16.73.51 4 u 2 16 1 28.340 -10.889 0.001

Stopped NTP service and it was still listening

root@linux:~ # service ntp stop
Shutting down network time protocol daemon (NTPD) done

root@linux:~ # ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
*LOCAL(0) .LOCL. 10 l 65 64 377 0.000 0.000 0.001
NTP1server 172.16.73.51 4 u – 16 377 30.697 -18.969 2.157
NTP2server 172.16.73.9 3 u 13 16 377 26.172 -1.256 1.121
NTP3server 172.16.73.50 5 u 11 16 377 23.315 -12.326 8.230
NTP4server 172.16.73.51 4 u 16 16 377 28.309 -11.716 2.544

Found that there was already another NTP process running

root@linux:~ # ps -ef | grep ntp
root 23339 1 0 2014 ? 00:38:38 ntpd -pq
root 30880 23657 0 10:07 pts/8 00:00:00 grep ntp

Killed the process

root@linux:~ # kill 23339
root@linux:~ # ps -ef | grep ntp
root 30918 23657 0 10:07 pts/8 00:00:00 grep ntp

NTP stopped listening

root@linux:~ # ntpq -p
ntpq: read: Connection refused

Started NTO service

root@linux:~ # service ntp start
Starting network time protocol daemon (NTPD) done

root@linux:~ # ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
NTP1server 172.16.73.51 4 u 5 16 1 31.138 -15.569 0.001
NTP2server 172.16.73.9 3 u 4 16 1 26.626 -15.001 0.001
NTP3server 172.16.73.50 5 u 3 16 1 23.416 -16.803 0.001
NTP4server 172.16.73.51 4 u 2 16 1 28.340 -10.889 0.001

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: