Author: Emerson .

UXMON: The OFFSET in one peer is greater than the threshold: 400

Node : linux.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : minor
OM Server Time: 2015-10-28 13:01:33
Message : UXMON: The OFFSET in one peer is greater than the threshold: 400
Msg Group : OS
Application : ntpmon
Object : ntp
Event Type :
not_found

Instance Name :
not_found

Instruction : The ntpq -p command shows with one or more peers the
offset in time is greater than the threshold set in the
ntp_mon.cfg.

Please, review the ntp status of your system or increase
the threshold in the ntp_mon if you consider this offset
in time between the clocks of your system and the peer’s clock
is aceptable

Please check /var/opt/OV/log/OpC/ntp_mon.log for more details

Run ntpq -p and then check if the offset is higher than what’s configured on ntp_mon.cfg

root@linux:~ # cat /var/opt/OV/conf/OpC/ntp_mon.cfg
NTP_OFFSET 400
NUM_PEER_FAILS 5

Sync the two ntp servers that you are using and then restart the ntp service on the server

root@linux:~ # ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
ntp1 142.40.238.18 6 u 23 64 377 1.761 -23.727 9.886
ntp2 15.179.90.135 5 u 30 64 377 2.003 -48.137 4.418

UXMON: volumegroup – Only one path detected, no path redundancy

Also see:
UXMON: mpathb – Only one path detected, no path redundancy
UXMON: SY1_log2_disk_001 – Only one path detected, no path redundancy

Node : linux.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : major
OM Server Time: 2015-10-26 08:13:59
Message : UXMON: volumegroup – Only one path detected, no path redundancy
Msg Group : OS
Application : mpmon
Object : mp
Event Type :
not_found

Instance Name :
not_found

Instruction : The multipathd -k”show map $device topology” command shows more details

Please check /var/opt/OV/log/OpC/mp_mon.log for more details

This is a virtual server

root@linux:~ # lspci
00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX – 82443BX/ZX/DX Host bridge (rev 01)
00:01.0 PCI bridge: Intel Corporation 440BX/ZX/DX – 82443BX/ZX/DX AGP bridge (rev 01)
00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 08)
00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
00:07.7 System peripheral: VMware Virtual Machine Communication Interface (rev 10)
00:0f.0 VGA compatible controller: VMware SVGA II Adapter
00:11.0 PCI bridge: VMware PCI bridge (rev 02)
00:15.0 PCI bridge: VMware PCI Express Root Port (rev 01)
00:15.1 PCI bridge: VMware PCI Express Root Port (rev 01)
00:15.2 PCI bridge: VMware PCI Express Root Port (rev 01)
00:15.3 PCI bridge: VMware PCI Express Root Port (rev 01)
00:15.4 PCI bridge: VMware PCI Express Root Port (rev 01)
00:15.5 PCI bridge: VMware PCI Express Root Port (rev 01)
00:15.6 PCI bridge: VMware PCI Express Root Port (rev 01)
00:15.7 PCI bridge: VMware PCI Express Root Port (rev 01)
00:16.0 PCI bridge: VMware PCI Express Root Port (rev 01)
00:16.1 PCI bridge: VMware PCI Express Root Port (rev 01)
00:16.2 PCI bridge: VMware PCI Express Root Port (rev 01)
00:16.3 PCI bridge: VMware PCI Express Root Port (rev 01)
00:16.4 PCI bridge: VMware PCI Express Root Port (rev 01)
00:16.5 PCI bridge: VMware PCI Express Root Port (rev 01)
00:16.6 PCI bridge: VMware PCI Express Root Port (rev 01)
00:16.7 PCI bridge: VMware PCI Express Root Port (rev 01)
00:17.0 PCI bridge: VMware PCI Express Root Port (rev 01)
00:17.1 PCI bridge: VMware PCI Express Root Port (rev 01)
00:17.2 PCI bridge: VMware PCI Express Root Port (rev 01)
00:17.3 PCI bridge: VMware PCI Express Root Port (rev 01)
00:17.4 PCI bridge: VMware PCI Express Root Port (rev 01)
00:17.5 PCI bridge: VMware PCI Express Root Port (rev 01)
00:17.6 PCI bridge: VMware PCI Express Root Port (rev 01)
00:17.7 PCI bridge: VMware PCI Express Root Port (rev 01)
00:18.0 PCI bridge: VMware PCI Express Root Port (rev 01)
00:18.1 PCI bridge: VMware PCI Express Root Port (rev 01)
00:18.2 PCI bridge: VMware PCI Express Root Port (rev 01)
00:18.3 PCI bridge: VMware PCI Express Root Port (rev 01)
00:18.4 PCI bridge: VMware PCI Express Root Port (rev 01)
00:18.5 PCI bridge: VMware PCI Express Root Port (rev 01)
00:18.6 PCI bridge: VMware PCI Express Root Port (rev 01)
00:18.7 PCI bridge: VMware PCI Express Root Port (rev 01)
03:00.0 Serial Attached SCSI controller: VMware PVSCSI SCSI Controller (rev 02)
0b:00.0 Ethernet controller: VMware VMXNET3 Ethernet Controller (rev 01)
13:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01)
1b:00.0 Ethernet controller: VMware VMXNET3 Ethernet Controller (rev 01)

I have disabled the module mpmon

root@linux:~ # vi /var/opt/OV/conf/OpC/mp_mon.cfg
disable = yes

root@linux:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -d mpmon
>>Debug mode activated
>>Opened the logfile: /var/opt/OV/log/OpC/mp_mon.log
>>multipathd is running with just one daemon
>>UXMONmdmon::PARSER_CFG start parsing the md_mon.cfg…….
>>Exit because of disable setting

HPOM threshold configuration files

Here is the files that you may want to copy to /var/opt/OV/conf/OpC and then customize

root@linux:/var/opt/OV/bin/instrumentation # ls -l *.cfg
-rwxr-xr-x. 1 root root 10305 Oct 21 09:05 act_mon.cfg
-rwxr-xr-x. 1 root root 1987 Oct 21 09:05 bond_mon.cfg
-rwxr-xr-x. 1 root root 4330 Oct 21 09:05 boot_mon.cfg
-rwxr-xr-x. 1 root root 3809 Oct 21 09:05 cron_mon.cfg
-rwxr-xr-x. 1 root root 9981 Oct 21 09:05 df_mon.cfg
-rwxr-xr-x. 1 root root 1484 Oct 21 09:05 dmsg_mon.cfg
-rwxr-xr-x. 1 root root 1324 Oct 21 09:05 hw_mon.cfg
-rwxr-xr-x. 1 root root 1493 Oct 21 09:05 kts_mon.cfg
-rwxr-xr-x. 1 root root 1404 Oct 21 09:05 loop_mon.cfg
-rwxr-xr-x. 1 root root 2017 Oct 21 09:05 lp_mon.cfg
-rwxr-xr-x. 1 root root 2150 Oct 21 09:05 md_mon.cfg
-rwxr-xr-x. 1 root root 2567 Oct 21 09:05 mp_mon.cfg
-rwxr-xr-x. 1 root root 8119 Oct 21 09:05 nfs_mon.cfg
-rwxr-xr-x. 1 root root 1607 Oct 21 09:05 nic_mon.cfg
-rwxr-xr-x. 1 root root 1786 Oct 21 09:05 ntp_mon.cfg
-rwxr-xr-x. 1 root root 10023 Oct 21 09:05 ps_mon.cfg
-rwxr-xr-x. 1 root root 4878 Oct 21 09:05 rc_mon.cfg
-rwxr-xr-x. 1 root root 1832 Oct 21 09:05 sc_mon.cfg
-rwxr-xr-x. 1 root root 3778 Oct 21 09:05 scsi_mon.cfg
-rwxr-xr-x. 1 root root 2919 Oct 21 09:05 sg_mon.cfg
-rwxr-xr-x. 1 root root 1956 Oct 21 09:05 sshd_mon.cfg
-rwxr-xr-x. 1 root root 1831 Oct 21 09:05 svcs_mon.cfg
-rwxr-xr-x. 1 root root 2860 Oct 21 09:05 swap_mon.cfg
-rwxr-xr-x. 1 root root 7925 Oct 21 09:05 UXMONbroker.cfg
-rwxr-xr-x. 1 root root 3947 Oct 21 09:05 UXMONmetrics.cfg
-rwxr-xr-x. 1 root root 3246 Oct 21 09:05 UXMONperf.cfg
-rwxr-xr-x. 1 root root 2966 Oct 21 09:05 uxmon_selfcheck.cfg
-rwxr-xr-x. 1 root root 2280 Oct 21 09:05 uxmonsyslog.cfg
-rwxr-xr-x. 1 root root 2273 Oct 21 09:05 vc_mon.cfg
-rwxr-xr-x. 1 root root 5479 Oct 21 09:05 vol_mon.cfg

act_mon.cfg

###############################################################################
# GD UX MON #
#@(#) $Id: act_mon.cfg 2180 2015-05-11 09:11:18Z baoliz $
#@(#) $Rev: 2180 $
#@(#) $Author: baoliz $
#@(#) $Date: 2015-05-11 17:11:18 +0800 (Mon, 11 May 2015) $
#@(#) $LastChangedBy: baoliz $
###############################################################################

#############################################################################
# File: act_mon.cfg
# Description: The File Activity Monitor Configuration file
# Package : GD UXMON (AROA PROJECT)
#############################################################################

################################################################################
#
# The intention of this script is to monitor the last modification time of a
# file, or to monitor its size. This is used to supervise other programs or
# scripts which have to write regularly to their logfile. If a program or a
# script doesn’t modify “its” file, there is probably something wrong with this process.
#
# If the configured interval is exceeded for the file which is intended to be
# monitored, or if the size is above or below the configured limit (depending
# on whether the size threshold has a modifier),
# a log-message is written
#
#

################################################################################
#
# Syntax:
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
# Filename threshold|MISSING severity schedule group
# ===========================================================================
# /dir/file1 15m WARNING 0000-2400 * MYGROUP
# /dir/file2 50s CRITICAL 0000-2400 * I
# /dir/file3 >100KB WARNING 0000-2400 * OTHERGROUP
# /dir 15n WARNING 0000-2400 *
# /dir/file MISSING
#
# Note wildcards are supported as well
# /dir/file1*4 5n WARNING
# /d*r/file1*4 15m WARNING
# /dir/prod/file1*4 >100KB WARNING
#
# rearm
#==============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# DEFAULTS
#===============
# If Severity not specified is assumed warning.
# If threshold is not qualified is assumed s (seconds)
# If GROUP not specified is assumed the group NONE
# If no qualifier is explicited (> or ” is assumed
#
# THRESHOLD|MISSING
#================
#
# The INTERVAL can be given in different units.Possible units for file age are:
# (no unit) => seconds
# s => seconds
# m => minutes
# h => hours
# d => days
# Possible units for file size are:
# KB => kilobytes
# MB => megabytes
# GB => gigabytes
# File count is configured with a small n
# n => File count
#
# In addition, file sizes may be preceded by the optional “>” or “” (if the file is larger than the threshold,
# a message will be sent).
#
# The file count will count the number of files in a directory(s). Note it
# will exclude directories within directories
#
# There is a special entry that can be used to monitor a directory based on a
# date. For example an application might output to a directory for a particular
# day and you need to check the number of files or that kind. By using
# the template (%FORMAT) the current date will be substituted e.g.
# /tmp/(%DDMMYYYY) monitors /tmp/26032006 . The date format depends on
# the template. The template can be formed with Y, M, D (year month day) and h m s
# hour, minute, second) (please, note M vs m). If you want to specify a time format
# with year of 2 digits, use YY , with 4 digits, YYYY.
# For instance, DDMMYYhhmmss is a full date timestamp like 230306152558
#
# MISSING
#===================
# The MISSING option is used to check if the file exists.
# If the argument MISSING is configured, and the file does not exist, it will trigger an alarm to OVO.
# If the MISSING doesn’t cofigure,no alarm triggered to OVO, only report in the LOGFILE.
#
#
# SEVERITY
#===================
# The SEVERITY can be one of the following:
# WARNING, MINOR, MAJOR, CRITICAL (case insensitive)
# If not explicited, it will be assumed warning.
# Please, be considerd with the severities, think twice prior to use CRITICAL, the most severe.
#
# SCHEDULE
#=====================
# You can define the time frame for each line/configuration giving an specific Schedule.
# This means that one file or configuration it will be only considered when the current time falls
# within the scheduled time configured
# The format is – , some examples are:
# 0000-2400 * The basic configuration means in any day in any moment
# 0900-1800 1-5 Means to check from 09:00 am to 18:00 pm from Monday to Friday.
# 0000-2400 6,0 Means at any hour only Saturday and Sunday
#
# ACTIONS
#================
# You can declare an ACTION, that will be triggered if the threshold is
# exceeded. The ACTION follow the syntax *ACTION
# Examples of allowed syntaxis:
# *ACTION rm -f /tmp/*.gz
#
# *ACTION gzip !FSNAME/*.log ; rm !FSNAME/*.txt
#
# Therefore, the ACTION can be in a different line or just following the
# filesystem declaration. (all in one line)
# Besides, you have some variables that will be replaced prior to trigger
# the execution:
# !GROUP (the group used)
# !MESSAGE (the same message that will be write in the logfile)
# !SEVERITY (The severity declared in this alarm)
#
#
#
# PACKAGE. Cluster Awareness
#=============================
# There is another directive , *PACKAGE pkgname, that can be used to inform to ACTMON that
# this resource is managed by a Cluster, and more specifically included in the package named
# pkgname
# The ACTMON , prior to test that file system, it will check if the cluster package is in fact
# running in this local node, if yes, it will process it as any other file system. If it’s not
# running in this node (therefore is running in another one), it will ignore this file system
#
# IPISLOCAL. IP address or FQHN
#=============================
# Check if IP address or FQHN is assigned to monitored node and if a corresponding interface is UP
#
# I
#=============================
# I is configured at the line end that means ignore the alarm even the file attribute breaches the threshold
#
# GROUPING
#==============================
# You can group the alarms, giving them the same GROUP. In that case
# only will be reported the MOST SEVERE ALARM and others will be masked by this one
# If you don’t set a group, is considered then the group NONE.
# Even although the message be masked by another more critical message
# in the same group, its action IT WILL BE EXECUTED. And, proper log message
# will be write.
#
#
# AUTORECOVERY. (TOTAL AUTORECOVERY)
#=====================================
# Each time an alarm happens, an action (if defined) is triggered, then a second
# check will be performed, so this monitoring is based on two phases:
# first, checks the alarms, and triggers the actions if any.
# second, evaluates the alarms again, and, only those who persist will be logged
# In that way, ACTION can help to automatize the maintenance and decrease the number of alarms.
#
#
# CMA SUPPORT
#========================================
# You can access to the EventType and EventTypeInstance using teh [ ] brackets statements
# to set which values are to be used by the following lines
# The syntax of such line is : [ OBJECT , EventType, EventTypeInstance ]
# And the Default values for all cases are NONE
# You can state only the object : [ OBJECT ]
# You can state object and EventType leave EventTypeInstace to default value [OBJECT ,EventType ]
# But you can never use more than three fields or less than 1
# Fields to be separated by commands and blank spaces not allowed
# [ OS, OS, SapApp ]
# process 1- warning
#
# NOTE: When a line has set a GROUP or OBJECT and this conflicts with the [ ] statement, the line field will be used
#
#
#
################################################################################
#
# Some examples more
#
# disable this module
# disable = yes

# this module allow to run after every 10 minutes
# interval = 10

#REARM = true

#/var/opt/OV/log/OpC/cl_mon.log 1h CRITICAL 0000-2400 *
#/var/opt/osit/log 10n
#/var/adm/syslog/syslog.log 3d warning 0800-1700 1-5 OPS
#/tmp/telelezing/(%DDMMYY) <2n warning MYGROUP
#/tmp/core MISSING
#
#############################################
# Make your CONFIGURATION FROM HERE #
#############################################

bond_mon.cfg

#############################################################################
#@ $Id: bond_mon.cfg 2132 2014-08-22 06:47:32Z zhaofeif $
#@ $Rev: 2197 $
#@ $Author: baoliz $
#@ $Date: 2015-05-17 18:16:07 +0800 (Sun, 17 May 2015) $
#@ $LastChangedBy: baoliz $
##############################################################################
#
# Syntax:
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
# exclude
#
# REARM
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# exclude
#========================
# The “exclude ” syntax is used to force entries of the device name to be ignored.
#

############################################################################
# For example:

# disable this module
# disable = yes

# the mdmon will execute every 5 minutes.
# interval = 5

# exclude some device
# exclude bond0
# exclude bond1

#############################################################################
# end of bond_mon.cfg
############################################################################

boot_mon.cfg

###############################################################################
#@(#) $Id: boot_mon.cfg 2149 2015-03-03 08:45:34Z zhaofeif $
#@(#) $Rev: 2149 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $ #
# =========================================================================== #
# Copyright (c) 2006 Hewlett Packard – All Rights Reserved. #
# Author: Cesar Lombao Vazquez #
###############################################################################

#############################################################################
# File: boot_mon.cfg
# Description: The File Activity Monitor Configuration file
# Package : GD UXMON (AROA PROJECT)
#############################################################################

################################################################################
#
# The intention of this script is to monitor whether the system has rebooted within 1 day.
#
#
################################################################################
#
# Syntax:
#
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]

# severity=xxx
# group=yyy
# typeinstance=ccccc
# eventtypeinstance=ttttt
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
#===============
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# WARNING MYGROUP
#
# DEFAULTS
#===============
# If Severity not specified is assumed warning.
# If GROUP not specified is assumed the group NONE
#
#
# GROUP
# ===================
# With GROUP it can be influenced in the alarm routing and ticketing interface, knowing that:
# IF the GROUP starts with B_ the alarm will reamin in the browser ONLY
# IF the GROUP starts with TT_ the alarm will be send to Ticketing systems (THIS IS THE DEFAULT BEHAVIOUR)
# IF the GROUP starts with N_ the alarm will be send to the Notification system
# IF the GROUP starts with TN_ the alarm will be sent both to Notif and TT
# IF the GROUP does not start with any of the above alarm will be sent to TroubleTicket ssystem (T_ )
#
# SEVERITY
#===================
# The SEVERITY can be one of the following:
# WARNING, MINOR, MAJOR, CRITICAL (case insensitive)
# If not explicited, it will be assumed warning.
# Please, be considerd with the severities, think twice prior to use CRITICAL, the most severe.
#
# CMA SUPPORT
#========================================
# You can access to the EventType and EventTypeInstance
#
#############################################################################
#
#
################################################################################
#
# Some examples more
#
# severity=warning
# group=MYGROUP
#
#############################################
# Make your CONFIGURATION FROM HERE #
#############################################
#REARM = true

disable = no

severity = major
group = BOOT

eventtype = NONE
eventtypeinstance = NONE

cron_mon.cfg

###############################################################################
#@(#) $Id: cron_mon.cfg 2149 2015-03-03 08:45:34Z zhaofeif $
#@(#) $Rev: 2149 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: cron_mon.cfg
# Description: The cron Monitor Configuration file
# Package : GD UXMON (AROA PROJECT)
#############################################################################

#############################################################################
#
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
#
#
#
#
# Commnand line triggered by the cron
# = warning|minor|major|critical (case insensitive)
# = hhmm-hhmm []
# = n[,] | * (is also posible a range like 1-5)
# where n represents day of a week starting with
# Sunday=0 and Saturday=6;
# * means all days
# (if you use 1,* or 3,*,4 alike is not allowed)
#
#
#
#
# DEFAULTS
#===============
# If Severity not specified is assumed warning.
# If GROUP not specified is assumed the group NONE
# If schedule not specified is assumed every day , at any time
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
#
#
# GROUPING
#==============================
# You can group the alarms, giving them the same GROUP. In that case
# only will be reported the MOST SEVERE ALARM and others will be masked by this one
# If you don’t set a group, is considered then the group NONE.
# Even although the message be masked by another more critical message
# in the same group, its action IT WILL BE EXECUTED. And, proper log message
# will be write.
#
#
# AUTORECOVERY. TOTAL
#================================
# Due the nature of the cron, there is no autorecovery, therefore
# each time an alarm be ready to be triggered it will be.
#
#
# CMA SUPPORT
#========================================
# You can access to the EventType and EventTypeInstance using teh [ ] brackets statements
# to set which values are to be used by the following lines
# The syntax of such line is : [ OBJECT , EventType, EventTypeInstance ]
# And the Default values for all cases are NONE
# You can state only the object : [ OBJECT ]
# You can state object and EventType leave EventTypeInstace to default value [OBJECT ,EventType ]
# But you can never use more than three fields or less than 1
# Fields to be separated by commands and blank spaces not allowed
# [ OS, OS, CRON ]
#
# NOTE: When a line has set a GROUP or OBJECT and this conflicts with the [ ] statement, the line field will be used
#
#############################################################################
#################
### EXAMPLES #
#################
#REARM = true
#
#/usr/local/bin/myexecution warning 0000-2400 * MYGROUP
#/opt/tool/bin/mytool MYTOOL

#####################
# End of examples #
#####################

###############################
# Start your config from here #
###############################

#############################################################################
# end of cron_mon.cfg
#############################################################################

df_mon.cfg

###############################################################################
#@(#) $Id: df_mon.cfg 2201 2015-05-19 08:07:49Z baoliz $
#@(#) $Rev: 2201 $
#@(#) $Author: baoliz $
#@(#) $Date: 2015-05-19 16:07:49 +0800 (Tue, 19 May 2015) $
#@(#) $LastChangedBy: baoliz $
###############################################################################

#############################################################################
# File: df_mon.cfg
# Description: The Diskspace Monitor Configuration file
# Package : GD UXMON (AROA PROJECT)
#############################################################################

#############################################################################
#
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
# exclude
# [] [] [] [ ; GROUP ]
# [*ACTION action] [*PACKAGE pkgname] [*IPISLOCAL ] [*DURATION ]
#
# space utilization threshold
# inode utilization threshold
# = normal|warning|minor|major|critical (case insensitive)
# = hhmm-hhmm []
# = n[,] | * (is also posible a range like 1-5)
# where n represents day of a week starting with
# Sunday=0 and Saturday=6;
# * means all days
# (if you use 1,* or 3,*,4 alike is not allowed)
#
# Use ‘-‘ to skip threshold parameter specification.
# If parameter is not specified the checking of that value is skipped.
#
# Examples allowed:
# * 95 90 warning
# /opt/SAP* 95 warning
# /tmp – 90% ; MYGROUP
# /home/userx 50Mb – 0000-2400 0-6 ; USERS
# /opt 70 – major 0600-1800 1,2,3,4,5
# /var 75% – critical 0600-2200 *
# /var/opt/OV 5Gb ; OVO (here the severity is taken by default-> warning)
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# DEFAULTS
#===============
# If Severity not specified is assumed warning.
# If threshold is not qualified with % or Gb , or Mb, is assumed %
# If GROUP not specified is assumed the group NONE
#
# THRESHOLD
#================
# The Tresholds can be set specifying the percentage used (the default)
# or absolute values in Mb or Gb. Thresholds valid (only for disk space) are
# 75Mb 34Gb 34 10%
# In the case of 34 is assumed you want to mean 34%
#
# ACTIONS
#================
# You can declare an ACTION, that will be triggered if the threshold is
# exceeded. The ACTION follow the syntax *ACTION
# Examples of allowed syntaxis:
# /tmp 1Mb – warning ; OS *ACTION rm -f /tmp/*.gz
#
# /home/userx 1Mb – warning
# *ACTION gzip !FSNAME/*.log ; rm !FSNAME/*.txt
#
# Therefore, the ACTION can be in a different line or just following the
# filesystem declaration. (all in one line)
# Besides, you have some variables that will be replaced prior to trigger
# the execution:
# !FSNAME (filesystem name)
# !GROUP (the group used)
# !MESSAGE (the same message that will be write in the logfile)
# !REASON (SPACE or INODE, depending the reason that triggered the alarm)
# !THRESHOLD (The Threshold that has been exceeded)
# !CURRENT (The current value compared with the threshold)
# !SEVERITY (The severity declared in this alarm)
#
#
# PACKAGE. Cluster Awareness
#=============================
# There is another directive , *PACKAGE pkgname, that can be used to inform to DFMON that
# this resource is managed by a Cluster, and more specifically included in the package named
# pkgname
# The DFMON , prior to test that file system, it will check if the cluster package is in fact
# running in this local node, if yes, it will process it as any other file system. If it’s not
# running in this node (therefore is running in another one), it will ignore this file system
#
# IPISLOCAL. IP address or FQHN
#=============================
# Check if IP address or FQHN is assigned to monitored node and if a corresponding interface is UP

# DURATION
#=============================
# Ticket is triggered after duration minutes and ticket will be held within DURATION minutes, if DURATION is configured. .

# GROUPING
#==============================
# You can group the alarms, giving them the same GROUP. In that case
# only will be reported the MOST SEVERE ALARM and others will be masked by this one
# If you don’t set a group, is considered then the group NONE.
# Even although the message be masked by another more critical message
# in the same group, its action IT WILL BE EXECUTED. And, proper log message
# will be write.
#
#
# AUTORECOVERY. (TOTAL AUTORECOVERY)
#=====================================
# Each time an alarm happens, an action (if defined) is triggered, then a second
# check will be performed, so this monitoring is based on two phases:
# first, checks the alarms, and triggers the actions if any.
# second, evaluates the alarms again, and, only those who persist will be logged
# In that way, ACTION can help to automatize the maintenance and decrease the number of alarms.
#
#
#========================================
# Duplicated defination of one filesystem
# /tmp 50 50 warning
# /tmp 90 90 critical
# File system /tmp exceeds threshold 50% of disk space usage and 50% of inode usage will send alarm with severity warning
# and when exceeds threshold 90% of disk space usage and 90% of inode usage will send alarm with severity critical
#
# * 50 50 warning
# /tmp 90 90 warning
# Character * is special, all file systems’ threshold represent the 50% of disk space usage and the 50% of inode usage except /tmp,
# /tmp only alarm when exceeds threshold of 90% of disk space usage and 90% of inode usage.
#
# /var/tmp 100GB warning
# /var/tmp 90 90 warning
# Different types of threshold, both lines will be available, /var/tmp will send alarm whatever it has less than 100GB or
# exceeds threshold 90% of disk space usage and 90% of inode usage.
#
#
# EXCLUDE. Ignore filesystems
#=====================================
# you can set the directive Filesystem Exclude, for instance
# /tmp exclude
# /var/opt/SAP* exclude
#
# And these file systems will be ignored and therefore not checked and not executed any action
# However, you must note the order IS RELEVANT, for instance
# /tmp 1 1 warning
# /tmp exclude
# will cause the /tmp be ignored, but
# /tmp exclude
# /tmp 1 1 warning
# will make the /tmp be considered.
#
# A more complex example, lets imagine you have file systems like /var/opt/FS* that you want
# be checked, however, you know there some like /var/opt/FS*tmp not relevant for you, that you
# would prefer to ignore:
# /var/opt/FS* 1Gb – warning
# /var/opt/FS*tmp exclude
#
# SO BEWARE, if your last line is like :
# * exclude
# this will be the same like to have the df_mon.cfg empty , ALL WILL BE IGNORED
# CMA SUPPORT
#========================================
# You can access to the EventType and EventTypeInstance using teh [ ] brackets statements
# to set which values are to be used by the following lines
# The syntax of such line is : [ OBJECT , EventType, EventTypeInstance ]
# And the Default values for all cases are NONE
# You can state only the object : [ OBJECT ]
# You can state object and EventType leave EventTypeInstace to default value [OBJECT ,EventType ]
# But you can never use more than three fields or less than 1
# Fields to be separated by commands and blank spaces not allowed
# [ OS, OS, SapApp ]
# * 95 95 warning
#
# NOTE: When a line has set a GROUP or OBJECT and this conflicts with the [ ] statement, the line field will be used
#
#############################################################################
#################
### EXAMPLES #
#################

# disable this module
# disable = yes

# this module allow to run after every 10 minutes
# interval = 10

# Monitor all filesystems
#* 95 95 warning

# exclude the /tmp
#/tmp exclude

# Create different severities for different thresholds for the same filessytem
#/ 80 100 major
#/ 95 100 critical

# Set an action when the disk space is 99% (inodes are ignored)
#/tmp 99 – major *ACTION gzip /tmp/*.log /tmp/*.txt

# Use the GROUP option
#/usr 80 2 major ; FILESYS

# Set an absolute threshold, if there is less than 1Gb of disk free space, then WARNING
# with GROUP FILESYS, and then, execute an action that stores the warning message in a log
#/tmp 1Gb WARNING ; FILESYS
#*ACTION echo !MESSAGE > kk.log ; date >> kk.log

# Use a percent treshold an execute an action (note the syntax, is allowed all in one line)
#/home 99% 1 *ACTION echo !MESSAGE > kk.log ; date >> kk.log

# Another example of ACTION
#/tmp 12 10 *ACTION rm /tmp/*pdf

# Allowed, but really, really dangerous, if this is the last line is like disabling the whole mon
#* exclude

# Using the Cluster Awareness, in this case, ONLY is the package pkg3 is running in this
# node, the /tmp will be checked. And if threshold exceeded, then it will execute an acion
#/tmp 1 1 WARNING *PACKAGE pkg3
#*ACTION echo “hola” > kk.log

#####################
# End of examples #
#####################

###############################
# Start your config from here #
###############################
#REARM = true

* 95 95 warning
* 98 98 major

#############################################################################
# end of df_mon.cfg
#############################################################################

dmsg_mon.cfg

###############################################################################
#@(#) $Id: dmsg_mon.cfg 2132 2014-08-22 06:47:32Z zhaofeif $
#@(#) $Rev: 2132 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2014-08-22 14:47:32 +0800 (Fri, 22 Aug 2014) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

###############################################################################
#
# File: dmsg_mon.
# [disable = yes|no]
# [interval = ]
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes

# Description: strings listed here don’t generate an ITO message for dmesg
# Syntax: just list the strings, one line for each
# !!! all dmesg lines matching one of the listed strings
# are taken out of monitoring !!!
#
# Example:
#
# hardware path
#
# If the string “hardware path” is listed, all dmesg lines matching (containing)
# the string “hardware path” are ignored for monitoring purposes.
# Still, the dmesg history contains these lines, but no message is generated.
#
###############################################################################

###############################################################################
# End of dmesg_mon.cfg
###############################################################################

hw_mon.cfg

#############################################################################
#@ $Id: hw_mon.cfg 2149 2015-03-03 08:45:34Z zhaofeif $
#@ $Rev: 2149 $
#@ $Author: zhaofeif $
#@ $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@ $LastChangedBy: zhaofeif $
##############################################################################
#[REARM = TRUE|FALSE]
#[disable = yes|no]
#[interval = ]
#[ignore string]
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# The module will allow to run after the interval minutes every time

# ignore string
#===============
# All the output got from command match the ignore string will not record to hw_mon.log that means this kind of hardware error will be ignored

# The below lines are default predefined strings for selection as ignore string which user can uncomment out if he/she need ignore the kind of error
# Please don’t modify the below predefined strings which just need the operation of comment or uncomment for you.

#REARM = true

#Power Supply Error
#FAN Error
#Thermal Sensor Error
#Memory Failed
#CPU Failed
#Physical Drive Failed
#Drive Array Accelerator Battery Failed

kts_mon.cfg

##################################################################################
#@ $Id: kts_mon.cfg 2149 2015-03-03 08:45:34Z zhaofeif $
#@ $Rev: 2149 $
#@ $Author: zhaofeif $
#@ $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@ $LastChangedBy: zhaofeif $
##################################################################################
#
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]

# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# PARAMETERS
# THRESH_NP=
# THRESH_NI=
# THRESH_NF=
# THRESH_NK=
#
# THRESH_NP states for the size of the kernel table for running processes
# THRESH_NI states for the size of the kernel table for inode
# THRESH_NF states for the size of the kernel table for opened files
# THRESH_NK states for the size of the kernel table nkthreads (only hpux)
# is a postive integer that states the percentage (please, don’t add the % value)
#
# Support
# Linux, solaris, HPUX supported
# but not all parameters are available
# Those not supported will be simply discarded
# Please, refer to documentation to know which ones are supported
#
#
#REARM = true

THRESH_NP=80
THRESH_NI=101
THRESH_NK=80
#THRESH_NF=70

loop_mon.cfg

###############################################################################
#@(#) $Id: loop_mon.cfg 2149 2015-03-03 08:45:34Z zhaofeif $
#@(#) $Rev: 2149 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
# #############################################################################
#############################################################################
# Syntax:
# [REARM = TRUE|FALSE]
# “DISABLE =[yes|no]”
# [interval = ]

# CPU_THRESHOLD =
# TIME_THRESHOLD =
#
# “EXCEPT_PRC= ”
# “EXCEPT_PRC= ”
################################
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# ENABLE / DISABLE THIS MODULE
#===============
# If you configure
# DISABLE =yes
# loopmon will be disabled, else loopmon will be enabled. The default configuration is enabled

# interval
#===============
# If the module will allow to run after the interval minutes

##################################
# FILTERING PROCESSES
# If you configure:
# EXCEPT_PRC=processname1
# EXCEPT_PRC=processname2
# EXCEPT_PRC=processname3
# Which means the processes processname1, processname2, processname3 will be ignored.
##################3
#
# START YOUR CONFIG FROM HERE
#REARM = true

DISABLE = yes

lp_mon.cfg

###############################################################################
#@(#) $Id: lp_mon_aix_linux_solaris.cfg 2149 2015-03-03 08:45:34Z zhaofeif $
#@(#) $Rev: 2149 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

##############################For AIX and Solaris ####################################
#For AIX and Solaris
# Syntax:
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# lpsched_check=YES|NO
## [queue_length=] [disable_check=YES|NO] [severity] [;]
# exclude [,…]
#
# – lpsched_check – If “YES” check if printing system is available (qdaemon). If “NO”,don’t check it and the monitoring is disabled.
# – queue_length – Max number of pending requests
# – disable_check – Check if printer or spooling is disabled
# – severity. – It can be one out of:critical, major, minor warning
# – exclude – Avoid the printer checked
#
# Examples
# ——-
# LPSCHED_CHECK = YES critical TT_PRINTER
# * queue_length = 20 disable_check = YES major TT_PRINTER
#
# ——
# LPSCHED_CHECK = NO
#
# ——
# LPSCHED_CHECK = YES
# myprint disable_check = YES major LP ; 0800-1800 *
#
####################################################################################
#REARM = true

lpsched_check=NO
#* queue_length=30 disable_check=YES

####################################################################################
# end of lp_mon.cfg
####################################################################################

md_mon.cfg

#############################################################################
#@ $Id: md_mon.cfg 2132 2014-08-22 06:47:32Z zhaofeif $
#@ $Rev: 2175 $
#@ $Author: baoliz $
#@ $Date: 2015-04-22 01:30:55 +0800 (Wed, 22 Apr 2015) $
#@ $LastChangedBy: baoliz $
##############################################################################
#
# Syntax:
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
# exclude
#
# REARM
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# exclude
#========================
# The “exclude ” syntax is used to force entries of the device name or UUID to be ignored.
#

############################################################################
# For example:

# disable this module
# disable = yes

# the mdmon will execute every 5 minutes.
# interval = 5

# Configurable timeout for hanging command ‘mdadm’, default is 120s
# cmd_timeout = 120

# exclude some device
# exclude md1
# exclude md2
# exclude 008ef942:540c0f64:ad58b352:a70017dd

#############################################################################
# end of md_mon.cfg
############################################################################

mp_mon.cfg

#############################################################################
#@ $Id: mp_mon.cfg 2132 2014-08-22 06:47:32Z zhaofeif $
#@ $Rev: 2175 $
#@ $Author: baoliz $
#@ $Date: 2015-04-22 01:30:55 +0800 (Wed, 22 Apr 2015) $
#@ $LastChangedBy: baoliz $
##############################################################################
#
# Syntax:
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
# [ignore_hba = yes|no]
# [ignore_san = yes|no]
# [ignore_alua = yes|no]
# exclude
#
# REARM
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# exclude
#========================
# The “exclude ” syntax is used to force entries of the device name to be ignored.
#
# ignore_hba
#========================
# Default is no. If set yes, disable HBA redundancy check
#
# ignore_san
#========================
# Default is no. If set yes, disable SAN switch redundancy check
#
# ignore_alua
#========================
# Default is no. If set yes, disable ALUA redundancy check
#
############################################################################
# For example:

# disable this module
# disable = yes

# the mdmon will execute every 5 minutes.
# interval = 5

# disable HBA redundancy check
# ignore_hba = yes

# disable SAN switch redundancy check
# ignore_san = yes

# disable ALUA redundancy check
# ignore_alua = yes

# exclude some multipath device name
# exclude eva1_NA2log
# exclude eva1_sbd1

#############################################################################
# end of mp_mon.cfg
############################################################################

nfs_mon.cfg

###############################################################################
#@(#) $Id: nfs_mon.cfg 2149 2015-03-03 08:45:34Z zhaofeif $
#@(#) $Rev: 2149 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: nfs_mon.cfg
# Description: The Diskspace Monitor Configuration file
# Package : GD UXMON (AROA PROJECT)
#############################################################################

#############################################################################
#—————————–
# DESCRIPTION
# This monitoring will check those NFS fs that been found already mounted. The check
# will be done with a read test only. (df command) that will performed trough
# a forked child.
#
# ————————
# The configuration is per line based
# CONFIGURATION
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
# [GROUP]
# [GROUP , EventType]
# [GROUP , EventType, EventTypeInstance]
#
# exclude
# [*ACTION action]
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# FS File system
#===================
# Is the local file system where is expected the export nfs be mounted.
# If this does not exists, it will be ignored silently
#
# File system also support character * in it.
# *
# will expand * as all nfs file systems defined as auto mount in /etc/fstab(Hpux/Linux), /usr/sbin/lsfs(AIX) or /etc/vfstab(Solaris).
# *
# will check all nfs file systems’ utilization which already be mounted.
# /user/*
# will check file system which name begins with “/user/” .
#
#
# POLLING|POLLING,TIMEOUT|UTILIZATION
#================================
# POLLING|POLLING,TIMEOUT
# The format: $1 or $1,$2
# $1 is the number of times to wait until perform the check. For instance, if you know
# this script is triggered by the ito template each 15 minutes, and the value
# you set is 10, then, the automonter will be cheked (and therefore, mounted if it’s
# not mounted) each 15*10 minutes=150 minutes, approx.
# Please, don’t set this value too low.
#
# $2 is the number of seconds that child process timeout.
#
# UTILIZATION
# The format: x% or x(Mb|kb|Gb)
# x is the number of file system utilization, x% will check the used space and x(Mb|kb|Gb) will check the free space.
# x%, for example 10% that means if the utilization of the file system exceeds 10%, an alarm will generate to OVO.
# x(Mb|kb|Gb), for example, 20Mb that means if the free space of the file system is less than 20Mb, an alarm will trigger.
#
#
# EXCLUDE. Ignore filesystems
#=====================================
# you can set the directive Filesystem Exclude, for instance
# /tmp exclude
# /var/opt/SAP* exclude
#
# And these file systems will be ignored and therefore not checked and not executed any action
# However, you must note the order IS RELEVANT, for instance
# /tmp 1 warning
# /tmp exclude
# will cause the /tmp be ignored, but
# /tmp exclude
# /tmp 1 warning
# will make the /tmp be considered.
#
#
# SEVERITY
#===================
# The SEVERITY can be one of the following:
# WARNING, MINOR, MAJOR, CRITICAL (case insensitive)
# If not explicited, it will be assumed warning.
# Please, be considerd with the severities, think twice prior to use CRITICAL, the most severe.
#
# SCHEDULE
#=====================
# You can define the time frame for each line/configuration giving an specific Schedule.
# This means that one file or configuration it will be only considered when the current time falls
# within the scheduled time configured
# The format is – , some examples are:
# 0000-2400 * The basic configuration means in any day in any moment
# 0900-1800 1-5 Means to check from 09:00 am to 18:00 pm from Monday to Friday.
# 0000-2400 6,0 Means at any hour only Saturday and Sunday
#
#
# GROUPING
#==============================
# You can group the alarms, giving them the same GROUP. In that case
# only will be reported the MOST SEVERE ALARM and others will be masked by this one
# If you don’t set a group, is considered then the group NONE.
# Even although the message be masked by another more critical message
# in the same group, its action IT WILL BE EXECUTED. And, proper log message
# will be write.
#
# ACTIONS
#================
# You can declare an ACTION, that will be triggered if the threshold is
# exceeded. The ACTION follow the syntax *ACTION
# Examples of allowed syntaxis:
# /tmp 1 warning mygroup
# *ACTION mount /tmp /tmp
#
# /home/userx 20% warning
# *ACTION echo “!FSNAME” >> /tmp/nfs.log
#
# Therefore, the ACTION can be in a different line or just following the
# filesystem declaration. (all in one line)
# Besides, you have some variables that will be replaced prior to trigger
# the execution:
# !FSNAME (filesystem name)
# !GROUP (the group used)
# !MESSAGE (the same message that will be write in the logfile)
# !REASON (STAT or UTILIZATION, depending the reason that triggered the alarm)
# !THRESHOLD (The Threshold that has been exceeded)
# !CURRENT (The current value compared with the threshold)
# !SEVERITY (The severity declared in this alarm)
#
#
# General Considerations and Defaults
#=========================================
#
# If no line present, no NFS will be checked, so no messages at all
# Is highly recommeneded those FS managed by automonter have a big interval polling,
# no less than 20
# Anyway, is discouraged a polling interval less than 3
# If Group is not set, assumed the default NONE
# If severity is not set, assumed the default WARNING
#
#
# CMA SUPPORT
#========================================
# You can access to the EventType and EventTypeInstance using teh [ ] brackets statements
# to set which values are to be used by the following lines
# The syntax of such line is : [ OBJECT , EventType, EventTypeInstance ]
# And the Default values for all cases are NONE
# You can state only the object : [ OBJECT ]
# You can state object and EventType leave EventTypeInstace to default value [OBJECT ,EventType ]
# But you can never use more than three fields or less than 1
# Fields to be separated by commands and blank spaces not allowed
# [ OS, OS, SapApp ]
# * 95 95 warning
#
# NOTE: When a line has set a GROUP or OBJECT and this conflicts with the [ ] statement, the line field will be used
#
#EXAMPLES
#===========================
#REARM = true

# disable this module
# disable = yes

# this module allow to run after every 10 minutes
# interval = 10

# Examples
#
#/mnt/tmp 10 critical
#/mount/m1 20 minor 0000-2400 * MYGROUP
# * 10 major
#
# ,
#/mnh 8, 20 mygroup
#
#
#/user/baoliz 20% mygroup
#*ACTION rm /user/baoliz/*.log
#/user/baoliz 200Mb minor 0000-2400 * MYGROUP
#/user/* 50% mygroup
#/user/who exclude
#

nic_mon.cfg

############################################################################
# Description of parameters
# ——————————————————————-
#Syntax:
#
#[REARM = TRUE|FALSE]
#[disable = yes|no]
#[interval = ]
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
# [msg_group = ] #Specific message group here.
# [ping_retry = ] #Default ping_retry value is 2 even without ping_retry setting, which means default timeout is 2*5 seconds.
# With ping_retry setting, timeout will be *5 seconds, make sure the number is more than 2.
#AutoDiscovery=ENABLED #To discover the Network Interfaces as well as to validate the network connections and routing table.
#AutoDiscovery=DISABLED #To stop auto-discovery and manually configure and monitor Network Interfaces
# #Specific Network Interface to monitor along with FQDN and severity
#*EXCLUDE_NIC / #Excluding some specific Network Interface(s) or Gateway(s) from monitoring
#*CLUSTER_NIC #Monitor cluster Network Interface(s).
#############################################################################
#REARM = true

AutoDiscovery=DISABLED
#############################################################################

ntp_mon.cfg

###############################################################################
#@(#) $Id: ntp_mon.cfg 2149 2015-03-03 08:45:34Z zhaofeif $
#@(#) $Rev: 2149 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

############################################
# NTP_MON.CFG
# DESCRIPTION
#——————-
# Monitoring the NTP Daemon
# You can configure the monitoring to watch over
# the offset reported by the ntpq -p command
#
# PARAMS DESCRIPTION
#——————————
# [REARM = TRUE|FALSE]
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# “DISABLE =[yes|no]” : This is the switch for ntpmon running
# NTP_OFFSET_CRITICAL,NTP_OFFSET_WARNING set offset critical and warning threshold for active peer, devaut values are 500 and 250 in ms
# NTP_STRATUM_CRITICAL, NTP_STRATUM_WARNING set offset of stratum of peers which status are one of “*”, “o”, “+” or “#”, the default value are 16 and 10
# NTP_ONLINE_TIME , ntpmon will not check anything if process ntpd is not running more than NTP_ONLINE_TIME(minutes), default value is 10(min)
# ALARM_DELAY, ntpmon allow to delay triggering alarm by setting minutes.
#——————————
# PARAMS REQUIRED
#———————————–
#
# CONFIG FILE
#————————–
############################################
#
# EXAMPLE:
#
#
#################################################################
#
# Set your configuration from here
#REARM = TRUE

DISABLE = NO
NTP_OFFSET_CRITICAL 530
NTP_OFFSET_WARNING 230
NTP_STRATUM_CRITICAL 15
NTP_STRATUM_WARNING 12
NTP_ONLINE_TIME 3
#ALARM_DELAY 60

# Set your configuration from here

ps_mon.cfg

###############################################################################
#@(#) $Id: ps_mon.cfg 2194 2015-05-15 12:57:28Z baoliz $
#@(#) $Rev: 2194 $
#@(#) $Author: baoliz $
#@(#) $Date: 2015-05-15 20:57:28 +0800 (Fri, 15 May 2015) $
#@(#) $LastChangedBy: baoliz $
# #############################################################################

#############################################################################
# File: ps_mon.cfg
# Description: The Unix process Monitor Configuration file
# Package : RMM UXMON
#############################################################################

#############################################################################
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
# [ACTION_TIMEOUT = ]
# [ ] []
#
# [*PINFO ]
# [*PUSER ]
# [*ACTION ]
# [*PACKAGE ]
# [*IPISLOCAL ]
# [*ELAPSED ]
# [*PATH ]
# [*FILEEXISTS ]
# [*ARGS ]
# [*EXACT_ARGS ]
# [*WITHOUT_ARGS ]
# [*ZONE ]
# [*DURATION ]
#
# = n | n- | n-m
# n : exactly n instances
# n- : n or more instances
# n-m : n to m instances
#
# = – [,]
#
# = critical | major | minor | warning
#
# = Free Text without blanks Max 8 chars
# This text will be used in the ITO message to set the OBJECT ito message field
# so it’s posible to use it in order to define the ITO-OVSD mappings
# Also, is used for the message grouping. Only one message of a group will
# be reported, the one with highest severity, others will be masked (although
# it’s respective actions will be executed)
#
#
# Note 1: Some processes change names after they are invoked so be sure
# to use the name as listed by ps OS command
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# ACTION
#===================
# You can declare an ACTION, that will be triggered if the threshold is
# exceeded. The ACTION follow the syntax *ACTION
# Examples of allowed syntaxis:
# *ACTION ps -e -o pid !PID
#
# Therefore, the ACTION can be in a different line or just following the
# filesystem declaration. (all in one line)
# Besides, you have some variables that will be replaced prior to trigger
# the execution:
# !PROCESS (process name)
# !GROUP (the group used)
# !MESSAGE (the same message that will be write in the logfile)
# !REASON (SPACE or INODE, depending the reason that triggered the alarm)
# !THRESHOLD (The Threshold that has been exceeded)
# !CURRENT (The current value compared with the threshold)
# !SEVERITY (The severity declared in this alarm)
# !PID (The pid number)
#
#
# ACTION_TIMEOUT
#=======================
# TIMEOUT action process will be killed. Without specified, default is 15 minutes.
#
#
# PACKAGE. Cluster Awareness
#=============================
# There is another dirctive , *PACKAGE pkgname, that can be used to inform to PSMON that
# this resource is managed by a Cluster, and more specifically included in the package named
# pkgname
# The PSMON , prior to test that file system, it will check if the cluster package is in fact
# running in this local node, if yes, it will process it as any other file system. If it’s not
# running in this node (therefore is running in another one), it will ignore this file system

# IPISLOCAL. IP address or FQHN
#=============================
# Check if IP address or FQHN is assigned to monitored node and if a corresponding interface is UP

# DURATION
#=============================
# Ticket is triggered after duration minutes and ticket will be held within DURATION minutes, if DURATION is configured. .

# GROUPING
#====================
# You can group the alarms, giving them the same GROUP. In that case
# only will be reported the MOST SEVERE ALARM
# If you don’t set a group, is considered then the group NONE.
# Even although the message be masked by another more critical message
# in the same group, its action IT WILL BE EXECUTED. And, proper log message
# will be write.
#
# ACTION. AUTORECOVERY TOTAL
#==============================
# Each time an alarm happens, an action (if defined) is triggered.
# However, a second check will happen, so this monitoring is based on two phases
# first, checks the alarms, and triggers the actions if any.
# second, evaluates the alarms, and, only those who persist will be logged
# in that way, ACTION can help to automatize the maintenance and reduce the number of alarms.
#
# FILTER. PUSER
#=================
# PUSER : This directive helps to filter by the user owner (real user or effective user) of the process. If you set this
# only will be considered those processes with such uer name, for instance
# httpd warning 1-
# *PUSER webuser
#
# FILTER. PATH
#==================
# PATH: You can filter out for the PATH. If you set a path /usr/bin, for a process P
# if the system founds /opt/bin/P is ignored. It works in the same way than PUSER
# process warning 1-
# *PATH /usr/bin
#
# FILTER. FILEEXISTS
#==================
# FILEEXISTS: This is used to check whether the process binary file exists. if the binary file doesn’t exist, there will ignore the process checking without alarm.
#
# process warning 1-
# *FILEEXISTS /usr/bin/process
#
# FILTER. ARGS
#====================
# ARGS: A process can have arguments or switches, a way to discriminate between them
# is using this directive.
# *ARGS -d root
# *ARGS file=myfile.log
# *ARGS argu # “argu” can match, “argument” also can match
# *ARGS /.*/test.sh # wildcard .* can be used
#
# FILTER. EXACT_ARGS
#====================
# EXACT_ARGS: mostly like the directive ARGS, but if use EXACT_ARGS, there must be exactly match the arguments and
# any longer string that can substring this argument can’t match it any more, for example
# *EXACT_ARGS argu # “argu” can match, “argument” can’t match any longer
#
# FILTER. WITHOUT_ARGS
#====================
# WITHOUT_ARGS: A process can be running without arguments or switches, for example
#
# Xvnc major 0 SECURITY
# *WITHOUT_ARGS -localhost
#
# That means only 0 process Xvnc can be running without the argument -localhost, in another word, if Xvnc is running it must be with
# the argument -localhost
#
# FILTER. ZONE
#======================
# ZONE: In some operating systems exists zones and different spaces for running processes,
# as is the case of SOLARIS 10 and their ZONES. By default PSMON looks to all processes
# available, but, if you want to focus only in one specific ZONE, you can use this option
#
#
# ELAPSED TIME directive
#==========================
# ELAPSED : You can set an alarm to check how much time a process has been running
# in minutes, hours or days, for instance
# *ELAPSED 5m
# *ELAPSED 4d
# *ELAPSED 36h
# In case any instance of that process exceeds the threshold you will get an alarm
#
#
# RESOURCE CONSUMPTION. PINFO directive
#=======================================
# PINFO: You can monitor how much total CPU a process is taken, or how much
# memory (virtual memory) is taken. In the case of memory, note this value
# is platform dependant, and its meaning is the VSZ file of their ps command
# http 1-
# *PINFO 80 1900
#
# Note that the CPU means the total cpu , with all cpu’s averaged, so ,
# in case you have 4 cpu a value of 25 means 100% of one CPU. This is not
# exact, but it gives a good approximation.
#
# CMA SUPPORT
#========================================
# You can access to the EventType and EventTypeInstance using teh [ ] brackets statements
# to set which values are to be used by the following lines
# The syntax of such line is : [ OBJECT , EventType, EventTypeInstance ]
# And the Default values for all cases are NONE
# You can state only the object : [ OBJECT ]
# You can state object and EventType leave EventTypeInstace to default value [OBJECT ,EventType ]
# But you can never use more than three fields or less than 1
# Fields to be separated by commands and blank spaces not allowed
# [ OS, OS, SapApp ]
# process 1- warning
#
# NOTE: When a line has set a GROUP or OBJECT and this conflicts with the [ ] statement, the line field will be used
#############################################################################

#############################################################################
# Examples
# REARM = true
# disable this module
# disable = yes

# this module allow to run after every 10 minutes
# interval = 10

# Check that at least one httpd process is running, assign to group WEB, filter
# by user apache. And in case threshold be exceeded, exdecute the action.
# httpd 1- WEB
# *PUSER apache
# *ACTION echo !PROCESS > logactions.log
#
#

#####################
# End of examples #
#####################

###############################
# Start your config from here #
###############################

#############################################################################
# end of ps_mon.cfg
#############################################################################

rc_mon.cfg

#!/usr/bin/perl
##################################################################################
#@(#) $Id: rc_mon.cfg-all-other 2149 2015-03-03 08:45:34Z zhaofeif $
#@(#) $Rev: 2149 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
##################################################################################
#
# Resource Monitoring configuration file
#__________________________________________
#
#
# In this file you can configure the UXMONrcmon giving the thresholds to the metrics
# you have defined in the metrics configuration file. In this way, setting these thresholds
# you can be warned with an OVO alarm when the metric value breaks such thresholds in excess
# or in defect.
#
# Syntax description
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
#———————-
# You can use the [GROUP] label to define a default GROUP for the underling lines however note
# in this current release (UXMON 1.2) is useless. Just simply is there for next versions. by now it does
# harm to have it there but it will not do nothing.
#
# METRIC CATEGORY INSTANCE SEVERITY THRESHOLD APPLICATION OBJECT SCHEDULE NODEGROUP
# METRIC: between ” chars is the metric name defined in the Metrics configuration file. Case Sensitive.
# CATEGORY: Each metric must be within a category, that defines subsets of metrics. In this way we avoid name collision
# as two metrics with the same name can coexist as long as they belong to different Categories
# (as defined in the METRICS CONFIGURATION FILE)
# INSTANCE: A metric can measure several “instances” of one concept, for instance, disk space, each file system is an instance in this case
# Note that the instances must be returned by the execution of the metric defined in the METRICS CONFIGURATION FILE
# SEVERITY: Typical values: Critical, Major, Minor, Warning
# THRESHOLD: You can define here what is the edge that if crossed will cause an alarm. You can set > or 4.5 default default Sun-Sat@00:00-23:59 #NodeGroup: UX_AR_test
#”File System Total (MB)” [diskspace] * warning >100Mb default group Sun-Sat@0:00-23:59 #NodeGroup: UX_AR_test
#
# Note, these examples were give to illustrate the syntax, if you want to use you have to take care such Metrics and Categories be defined in the
# METRICS configuration file

#
# Threshold definition for kerneel parameters monitoring (not all are monitored in all paltforms !!!)
# For monitored parameters see the metrics definition file UXMONmetrics.cfg and comment out unnecessary lines below.
#
#”Kernel Parameters Check (%): NP, NI, NF, NK” [kernel] NP Major >80 default default Sun-Sat@0:00-23:59 #NodeGroup: ktsmon
#”Kernel Parameters Check (%): NP, NI, NF, NK” [kernel] NI Major >6 default default Sun-Sat@00:00-23:59 #NodeGroup: ktsmon
#”Kernel Parameters Check (%): NP, NI, NF, NK” [kernel] NF Major >6 default default Sun-Sat@00:00-23:59 #NodeGroup: ktsmon
#”Kernel Parameters Check (%): NP, NI, NF, NK” [kernel] NK Major >6 default default Sun-Sat@00:00-23:59 #NodeGroup: ktsmon

#
# Threshold definition for user quota monitoring
#
#”User Quota Check (%)” [quota] user1 Major >90 default default Sun-Sat@0:00-23:59 #NodeGroup: uq_mon
#”User Quota Check (%)” [quota] user2 Warning >80 default default Sun-Sat@0:00-23:59 #NodeGroup: uq_mon

#
# Threshold definition for monitoring of total swap space used (the keyword “total” is mandatory in configuration)
#
#”Total Swap Space Check (%)” [swap] total Major >95 default default Sun-Sat@00:00-23:59 #NodeGroup: swapmon

sc_mon.cfg

###############################################################################
#@(#) $Id: sc_mon.cfg 2162 2015-03-19 08:40:05Z zhaofeif $
#@(#) $Rev: 2162 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-19 16:40:05 +0800 (Thu, 19 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: sc_mon.cfg
#
#############################################################################
#
# configuration for SC monitoring
#
# Description of parameters
# —————————————————————————
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# RG[0]=XYZ Package name 1
# RG_NODE[0]=ABC Primary node on which the package must run
# Define as * will disable running on adoptive node check
# RG_SWTCH[0]=1 Set to 1 if Package_switching should be ENABLED
# Set to 0 if Package_switching must not be ENABLED
#
# numbers in [] represent a package
#
#
#############################################################################
# example configuration:
# REARM = TRUE
# RG[0]=sg_pack_one; RG_NODE[0]=sg_node_one; RG_SWTCH[0]=1
# RG[1]=sg_pack_two; RG_NODE[1]=sg_node_two; RG_SWTCH[1]=1
# RG[2]=sg_pack_three; RG_NODE[2]=*; RG_SWTCH[2]=1
#############################################################################
# end of sc_mon.cfg
#############################################################################

scsi_mon.cfg

###############################################################################
#@(#) $Id: scsi_mon.cfg 2162 2015-03-19 08:40:05Z zhaofeif $
#@(#) $Rev: 2162 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-19 16:40:05 +0800 (Thu, 19 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: scsi_mon.cfg
# Description: The multi path Monitor Configuration file
# Package : GD UXMON (AROA PROJECT)
#############################################################################

################################################################################
#
#
################################################################################
#
# Syntax:
#
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]

# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
#1. EXCLUDE_DEVICES=
# default: is empty

#2. PROTOCOLS=
# default: PROTOCOLS=fibre_channel

#3. CHECK_UNOPEN_DEVICES=
# default: CHECK_UNOPEN_DEVICES=yes
# The state is from command “scsimgr get_info -D device”

#4. DISABLE_MULTIPATH_MONITORING=
# default: DISABLE_MULTIPATH_MONITORING=no
# If this is set to yes monitor should not be triggered at all.#

#5. CHECK_REDUNDANT=
# default: CHECK_REDUNDANT=no
# If this is set to yes will check redundant, otherwise will ignore.#

# EXCLUDE_DEVICES
# ==================
# if argument is provided with “*”, the disk check will take the argument as regular expression. eg(disk1* matches disk1,disk12,disk122) in this case only one argument is possible

# if argument(s) is provided without “*”, the disk check will excplicitly exclude that disk(s)
# if more than 1 argument are supplied, then “*” can’t be used. Arguments should be seperated by comma.
#
#
# PROTOCOLS
#===================
# supply the device type seperated by comma if more than 1 device type specified, otherwise write one device without a comma
#############################################################################
#
#
################################################################################
#
# examples
#
#EXCLUDE_DEVICES=disk4 ,disk6,disk7
#PROTOCOLS= parallel_scsi
#CHECK_UNOPEN_DEVICES= yes
#DISABLE_MULTIPATH_MONITORING=no
#
#
#############################################
# Make your CONFIGURATION FROM HERE #
#############################################
#REARM = TRUE
EXCLUDE_DEVICES=
PROTOCOLS= fibre_channel
CHECK_UNOPEN_DEVICES= no
DISABLE_MULTIPATH_MONITORING= no
CHECK_REDUNDANT=no

sg_mon.cfg

###############################################################################
#@(#) $Id: sg_mon.cfg 2162 2015-03-19 08:40:05Z zhaofeif $
#@(#) $Rev: 2162 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-19 16:40:05 +0800 (Thu, 19 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: sg_mon.cfg
# Description: Check service guard Package monitoring script
# Package : Concorde – UXSM
# Version: A.01.00
#
#############################################################################
#
# configuration for SG monitoring
# Script sg_mon.ksh
#
# Description of parameters
# —————————————————————————
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]

# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
#
# [DISABLE_CLUSTER_LOCK_CHECK = yes|no]
# Automatic checking of Cluster Lock (Lock Disk, Lock LUN, or Quorum server) supports for SG version after (and including) A.11.17,
# and customized QUORUM_SERVER specifications will be ignored.
# If set as YES (or yes), will disable automatic checking of Cluster Lock.
#
#================
# If the module will allow to run after the interval minutes

# PKG[0]=XYZ Package name 1
# PKG_NODE[0]=ABC Primary node on which the package must run
# Define as * will disable running on adoptive node check
# PKG_SWTCH[0]=1 Set to 1 if Package_switching should be ENABLED
# Set to 0 if Package_switching must not be ENABLED
#
# numbers in [] represent a package
#
# QUORUM_SERVER[0]=server_name Quorum server name
# QUORUM_NODE[0]=node_name node name in the cluster
#
# Global parameter:
# LAN_MON=1 (default) If LAN interfaces should be monitored (up/down).
# LAN_MON=0 If LAN interfaces should not be monitored.
# GROUP=NONE Default is NONE,In that case
# only will be reported the MOST SEVERE ALARM
# and others will be masked by this one
#############################################################################
# example configuration:
# REARM = TRUE
# PKG[0]=sg_pack_one; PKG_NODE[0]=sg_node_one; PKG_SWTCH[0]=1
# PKG[1]=sg_pack_two; PKG_NODE[1]=sg_node_two; PKG_SWTCH[1]=1
# PKG[2]=sg_pack_three; PKG_NODE[2]=*; PKG_SWTCH[2]=1
# QUORUM_SERVER[0]=server_name_one; QUORUM_NODE[0]=sg_node_name_one
# QUORUM_SERVER[1]=server_name_two; QUORUM_NODE[1]=sg_node_name_two
# LAN_MON=1
# GROUP=MYGROUP
#############################################################################
# end of sg_mon.cfg
#############################################################################

sshd_mon.cfg

###############################################################################
#@(#) $Id: sshd_mon.cfg 2162 2015-03-19 08:40:05Z zhaofeif $
#@(#) $Rev: 2162 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-19 16:40:05 +0800 (Thu, 19 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: sshd_mon.cfg
# Description: The sshd Monitor Configuration file
# Package : GD UXMON (AROA PROJECT)
#############################################################################
############################################################################
# Description of parameters
# ——————————————————————-
#Syntax:
#
#[REARM = TRUE|FALSE]
#[disable = yes|no]
#[interval = ]
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# [CUSTOM_SSHD_BIN_LIST = ]
# Default SSHD path is /opt/sshd/sbin/sshd, /usr/local/sbin/sshd, /usr/sbin/sshd, /opt/ssh/sbin/sshd, /usr/lib/ssh/sshd
# [CUSTOM_SSHD_PID_LIST = ]
# Default sshd.pid file path is /var/run/sshd.pid, /var/run/sshd.init.pid, /var/run/sshd-quest.pid, /usr/local/etc/sshd.pid, /var/openssh/sshd.pid,/etc/sshd.pid
# If sshd binary or sshd.pid file not in default list, define comma-separated path as following.
# Eg:
# CUSTOM_SSHD_BIN_LIST = /usr/mypath1/sshd, /usr/mypath2/sshd
# CUSTOM_SSHD_PID_LIST = /var/mypath/sshd.pid
#############################################################################
# REARM = TRUE

#############################################################################

svcs_mon.cfg

###############################################################################
# GD UX MON #
#@(#) $Id: svcs_mon.cfg 2162 2015-03-19 08:40:05Z zhaofeif $
#@(#) $Rev: 2162 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2012-07-25 18:54:50 +0800
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: svcs_mon.cfg
# Description: The File Activity Monitor Configuration file
# Package : GD UXMON (AROA PROJECT)
#############################################################################

################################################################################
#
# The intention of this script is to monitor solaris service status
#
#

################################################################################
#
# Syntax:
# ——————————————————————-
#[REARM = TRUE|FALSE]
#[disable = yes|no]
#[interval = ]
#[EXCLUDE_SERVICE = ]
#
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
#===============
# set specific services exclude, for instance
# EXCLUDE_SERVICE = /application/print/server
#############################################################################
# REARM = TRUE

#############################################################################

swap_mon.cfg

###############################################################################
#@(#) $Id: swap_mon.cfg 2162 2015-03-19 08:40:05Z zhaofeif $
#@(#) $Rev: 2162 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-03-19 16:40:05 +0800 (Thu, 19 Mar 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################
#
# All lines which start with a hash-sign (#) will be ignored
# the whole config file is case-insensitive
#
# THERE ARE NO SYNTAX CHECKS SO MAKE SURE THAT YOUR CONFIGURATION
# IS CORRECT!!!
#
# Every config line has the following layout:
# total [ []]
#
# Every config line must start with “total” because every check is
# performed on the totally free space
#
# The percent used must be given as a positive integer, it must be
# inbetween 0 and 100. If the given percentage is exceeded, an alarm
# is raised.
#
# In the third column you have to configure a severity which is used for
# this alert. Possible severities are: warning, major, critical
#
# With the alert type you can configure which alarm will be raised. Possible
# values are:
# B -> Browser
# N -> Browser+Notification
# T -> Browser+Trouble Ticket
# NT -> Browser+Notification+Trouble Ticket
#
# The from-to gives the time of the day when the script shall run.
# The from and the to time are given as 4-digit-numbers in the 24h format
# So “all day long” would be “0000-2400” (in the configline without the quotes!)
# If you don’t configure a from-to time, the script will run all day long
#
# If you have configured the daytime on which the script shall run, you can also
# configure on which days the script will run. The days are given with crontab
# syntax. You can either give multiple days separated with commas from each other
# (e.g. “1,3,4” -> Monday, Wednesday, Thursday) or you can give span of days
# separated with a hyphen (e.g. “2-4” -> from Tuesday until Thursday).
# In this case the first number has always to be lower than the second one!!!
# As you could already see in the examples, the week starts with “0” for sunday
# and ends with “6” for saturday.
#
# You can configure multiple percent_used entries for the same day(s), the
# script will always raise the alert of the highest threshold that is exceeded
#
#[REARM = TRUE|FALSE]
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
###############################################################################

################################################################################
# Example configuration
###############################################################################
#
#REARM = TRUE

#total percent_used severity Alert FROM-TO Days
total 95 major T 0000-2400 *

UXMONbroker.cfg

###############################################################################
#@(#) $Id: UXMONbroker.cfg.linux.ovo8 2177 2015-04-29 08:06:01Z zhaofeif $
#@(#) $Rev: 2177 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-04-29 16:06:01 +0800 (Wed, 29 Apr 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

#$UXMON_OVO_RELEASE = “02.01.03”;

############################
# OVO Agent information
$UXMON_OVO_BASE = “/var/opt/OV”;
$UXMON_OVO_TMP = $UXMON_OVO_BASE.”/tmp/OpC”;
$UXMON_OVO_CMDS = $UXMON_OVO_BASE.”/bin/instrumentation”;
$UXMON_OVO_DFT_CFG = $UXMON_OVO_CMDS;
$UXMON_OVO_LOG = $UXMON_OVO_BASE.”/log/OpC”;
$UXMON_OVO_CFG = $UXMON_OVO_BASE.”/conf/OpC”;

##################################################
# OS Commands and info
# PLATFORM DEPENDENCY
$UXMON_OS_HOSTNAME = hostname();
$UXMON_SG_CMVIEWCL = “/usr/local/cmcluster/bin/cmviewcl”;
$UXMON_VT_HAGRP = “/opt/VRTSvcs/bin/hagrp”;
$UXMON_CRON_LOG = “/var/spool/mail/root”;
$UXMON_SHELL = “/bin/bash”;
$UXMON_HACMP_HAGRP = “/usr/es/sbin/cluster/utilities/clRGinfo”;

$UXMON_PATH_SGMON = “/usr/local/cmcluster/bin:/opt/cmcluster/bin:/usr/sbin”;

################
# Perl runtime to be used internally
$UXMON_OVO_PERL = “/opt/OV/nonOV/perl/a/bin/perl”;

#######################
# Default Values
$UXMON_DFT_SEVERITY = “warning”;
$UXMON_DFT_GROUP = “NONE”;
$UXMON_DFT_EVENTTYPE = “NONE”;
$UXMON_DFT_EVENTINSTANCE = “NONE”;

$UXMON_SC_SCSTAT = “/usr/cluster/bin/scstat”;
$UXMON_MODULE_METRICS_FILE = “UXMONmetrics.cfg”;
$UXMON_MODULE_METRICS_PATH = “$UXMON_OVO_CFG/$UXMON_MODULE_METRICS_FILE”;

#############################33
# In hours, is the threshold to consider a module is inactive
$UXMON_MODULE_INACTIVE = 10;

################################
# The number of perl processes to consider something is wrong
$UXMON_MAX_PERL_INSTANCES = 100;

##########################
# TIMEOUT for OS commands. It will wait only those seconds, when this is timeout, it will trigger ‘got block’ alarm.
$UXMON_OS_TIMEOUT = 300 ;

##########################
# TIMEOUT for module running. how many times of INTERVAL running for timeout, it will trigger ‘running timeout’ alarm.
$UXMON_MODULE_TIMEOUT = 3;

#############################################################
$UXMON_MODULE_EXEC{dfmon} = “UXMONdfmon”;
$UXMON_MODULE_EXEC{psmon} = “UXMONpsmon”;
$UXMON_MODULE_EXEC{volmon} = “UXMONvolmon”;
$UXMON_MODULE_EXEC{cronmon} = “UXMONcronmon”;
$UXMON_MODULE_EXEC{actmon} = “UXMONactmon”;
$UXMON_MODULE_EXEC{ntpmon} = “UXMONntpmon”;
$UXMON_MODULE_EXEC{nfsmon} = “UXMONnfsmon”;
$UXMON_MODULE_EXEC{lpmon} = “UXMONlpmon”;
$UXMON_MODULE_EXEC{perfmon} = “UXMONperfmon”;
$UXMON_MODULE_EXEC{sshdmon} = “UXMONsshdmon”;
$UXMON_MODULE_EXEC{rcmon} = “UXMONrcmon”;
$UXMON_MODULE_EXEC{evm} = “UXMONevm”;
$UXMON_MODULE_EXEC{loopmon} = “UXMONloopmon”;
$UXMON_MODULE_EXEC{uxmon} = “UXMONuxmon”;
$UXMON_MODULE_EXEC{advfsmon} = “UXMONadvfsmon”;
$UXMON_MODULE_EXEC{dmesg} = “UXMONdmsg”;
$UXMON_MODULE_EXEC{bootmon} = “UXMONbootmon”;
$UXMON_MODULE_EXEC{scmon} = “UXMONscmon”;
$UXMON_MODULE_EXEC{sgmon} = “UXMONsgmon”;
$UXMON_MODULE_EXEC{vcmon} = “UXMONvcmon”;
$UXMON_MODULE_EXEC{ktsmon} = “UXMONktsmon”;
$UXMON_MODULE_EXEC{selfcheck} = “UXMONselfcheck”;
$UXMON_MODULE_EXEC{nicmon} = “UXMONnicmon”;
$UXMON_MODULE_EXEC{hwmon} = “UXMONhwmon”;
$UXMON_MODULE_EXEC{mdmon} = “UXMONmdmon”;
$UXMON_MODULE_EXEC{bondmon} = “UXMONbondmon”;
$UXMON_MODULE_EXEC{mpmon} = “UXMONmpmon”;
$UXMON_MODULE_EXEC{swapmon} = “UXMONswapmon”;

@UXMON_LIST_MODULES = keys (%UXMON_MODULE_EXEC);

##############################################################################################
# The complete execution command
foreach $k (@UXMON_LIST_MODULES)
{
$UXMON_MODULE_EXECPATH{$k} = catfile($UXMON_OVO_CMDS,$UXMON_MODULE_EXEC{$k});
}

##########################################################3
# Collect interface, only defined for these three modules
$UXMON_MODULE_INTERFACE{dfmon} = “UXMONcollectDFMON”;
$UXMON_MODULE_INTERFACE{psmon} = “UXMONcollectPSMON”;
$UXMON_MODULE_INTERFACE{actmon} = “UXMONcollectACTMON”;

#################################################
# The logfile
$UXMON_MODULE_LOGFILE{dfmon} = “df_mon.log”;
$UXMON_MODULE_LOGFILE{psmon} = “ps_mon.log”;
$UXMON_MODULE_LOGFILE{actmon} = “act_mon.log”;
$UXMON_MODULE_LOGFILE{volmon} = “vol_mon.log”;
$UXMON_MODULE_LOGFILE{cronmon} = “cron_mon.log”;
$UXMON_MODULE_LOGFILE{ntpmon} = “ntp_mon.log”;
$UXMON_MODULE_LOGFILE{lpmon} = “lp_mon.log”;
$UXMON_MODULE_LOGFILE{nfsmon} = “nfs_mon.log”;
$UXMON_MODULE_LOGFILE{perfmon} = “perf_mon.log”;
$UXMON_MODULE_LOGFILE{dmesg} = “dmsg_mon.hist”;
$UXMON_MODULE_LOGFILE{ktsmon} = “kts_mon.log”;
$UXMON_MODULE_LOGFILE{sshdmon} = “sshd_mon.log”;
$UXMON_MODULE_LOGFILE{rcmon} = “rc_mon.log”;
$UXMON_MODULE_LOGFILE{evm} = “evm_mon.log”;
$UXMON_MODULE_LOGFILE{loopmon} = “loop_mon.log”;
$UXMON_MODULE_LOGFILE{uxmon} = “uxmon.log”;
$UXMON_MODULE_LOGFILE{advfsmon} = “advfs_mon.log”;
$UXMON_MODULE_LOGFILE{dmesg} = “dmsg_mon.log”;
$UXMON_MODULE_LOGFILE{bootmon} = “boot_mon.log”;
$UXMON_MODULE_LOGFILE{scmon} = “sc_mon.log”;
$UXMON_MODULE_LOGFILE{sgmon} = “sg_mon.log”;
$UXMON_MODULE_LOGFILE{vcmon} = “vc_mon.log”;
$UXMON_MODULE_LOGFILE{selfcheck} = “uxmon.log”;
$UXMON_MODULE_LOGFILE{nicmon} = “nic_mon.log”;
$UXMON_MODULE_LOGFILE{hwmon} = “hw_mon.log”;
$UXMON_MODULE_LOGFILE{mdmon} = “md_mon.log”;
$UXMON_MODULE_LOGFILE{bondmon} = “bond_mon.log”;
$UXMON_MODULE_LOGFILE{mpmon} = “mp_mon.log”;
$UXMON_MODULE_LOGFILE{swapmon} = “swap_mon.log”;

#########################################################
# The complete LOGPATH
foreach $k (@UXMON_LIST_MODULES)
{
$UXMON_MODULE_LOGPATH{$k} = catfile($UXMON_OVO_LOG,$UXMON_MODULE_LOGFILE{$k});
}

################################################################
# The config file
$UXMON_MODULE_CFGFILE{dfmon} = “df_mon.cfg”;
$UXMON_MODULE_CFGFILE{psmon} = “ps_mon.cfg”;
$UXMON_MODULE_CFGFILE{actmon} = “act_mon.cfg”;
$UXMON_MODULE_CFGFILE{volmon} = “vol_mon.cfg”;
$UXMON_MODULE_CFGFILE{cronmon} = “cron_mon.cfg”;
$UXMON_MODULE_CFGFILE{ntpmon} = “ntp_mon.cfg”;
$UXMON_MODULE_CFGFILE{lpmon} = “lp_mon.cfg”;
$UXMON_MODULE_CFGFILE{nfsmon} = “nfs_mon.cfg”;
$UXMON_MODULE_CFGFILE{perfmon} = “perf_mon.cfg”;
$UXMON_MODULE_CFGFILE{ktsmon} = “kts_mon.cfg”;
$UXMON_MODULE_CFGFILE{sshdmon} = “sshd_mon.cfg”;
$UXMON_MODULE_CFGFILE{evm} = “evm_mon.cfg”;
$UXMON_MODULE_CFGFILE{rcmon} = “rc_mon.cfg”;
$UXMON_MODULE_CFGFILE{loopmon} = “loop_mon.cfg”;
$UXMON_MODULE_CFGFILE{uxmon} = “uxmon.cfg”;
$UXMON_MODULE_CFGFILE{advfsmon} = “advfs_mon.cfg”;
$UXMON_MODULE_CFGFILE{dmesg} = “dmsg_mon.cfg”;
$UXMON_MODULE_CFGFILE{bootmon} = “boot_mon.cfg”;
$UXMON_MODULE_CFGFILE{scmon} = “sc_mon.cfg”;
$UXMON_MODULE_CFGFILE{sgmon} = “sg_mon.cfg”;
$UXMON_MODULE_CFGFILE{vcmon} = “vc_mon.cfg”;
$UXMON_MODULE_CFGFILE{selfcheck} = “”;
$UXMON_MODULE_CFGFILE{nicmon} = “nic_mon.cfg”;
$UXMON_MODULE_CFGFILE{hwmon} = “hw_mon.cfg”;
$UXMON_MODULE_CFGFILE{mdmon} = “md_mon.cfg”;
$UXMON_MODULE_CFGFILE{bondmon} = “bond_mon.cfg”;
$UXMON_MODULE_CFGFILE{mpmon} = “mp_mon.cfg”;
$UXMON_MODULE_CFGFILE{swapmon} = “swap_mon.cfg”;

#########################################################
# The complete CFG PATH
foreach $k (@UXMON_LIST_MODULES)
{
$UXMON_MODULE_CFGPATH{$k} = catfile($UXMON_OVO_CFG,$UXMON_MODULE_CFGFILE{$k});
}

###############################################################################

#########################
# this line is mandatory
return 1;

UXMONmetrics.cfg

#!/usr/bin/perl
##################################################################################
#@(#) $Id: UXMONmetrics.cfg-linux 2132 2014-08-22 06:47:32Z zhaofeif $
#@(#) $Rev: 2132 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2014-08-22 14:47:32 +0800 (Fri, 22 Aug 2014) $
#@(#) $LastChangedBy: zhaofeif $
##################################################################################
#
# Resource Monitoring configuration file
#__________________________________________
#
#
# This is the configuration file where you can “build” the metrics to be used in the rc_mon.cfg
# in other words, each metric you use in rc_mon.cfg MUST HAVE its definition here if you want
# to use it, otherwise such line in the rc_mon.cfg will be ignored.
#
# The syntax to create a METRIC is the next one:
# SYNTAX
#————————-
# file:= categories
# | category categories
#
# category:= defcategory metrics closecategory
# metrics:= metric
# | metric metrics
# metric:= metricname command unnits anno_cmd closemetric
# metricname:=
# command:= command = TEXT
# units:= units = UNITS
# anno_cmd:= anno_cmd = TEXT
# closemtric:=
# closecategory:= end object
#
# Basically the idea is to define several categories with MetricCategory line. Within each Category the Metric name MUST BE UNIQUE to avoid
# collisions. Within each category you can define at least one metric.
# Each metric consist in a name, a command to be executed, a units definition an annotation command.
# The command to be exucted is the metric itself, and this command must extract the info precisely in
# a well defined format. See below Format of Metrics. In this way , the UXMONrcmon will execute the lines you define here
# and the output of such command will be parsed to find out if the thresholds set in rc_mon.cfg has been exceeded.
# The units is useful to define the format of what you are working with
#
# OUTUT FORMAT OF COMMAND
#—————————-
# The command metrics must return by standard output a numeric value for such metric, decimal dots are allowed but not text. Only numbers.
# Note that in case you define a metric that takes info from several instances (see rc_mon.cfg) then the output must be one line per instance:
# INSTANCE1 VALUE
# INSTANCE2 VALUE
# ..and so on
# for example
# /var/opt 15
# /var/tmp 20
# …

##
## Some examples
##——————–
#MetricCategory = diskspace
#
#command = df -m|awk ‘{ if ($1 !~/Filesystem/ && $3>=0) { printf(“%s %.2f\n”, $NF,$(NF-3));} }’
#units = MB
#anno_cmd = df -m
#
#end object

##MetricCategory = Processor
##
## command = vmstat 2 4 | awk ‘ { getline;getline;getline;getline;getline; printf (“CPU %d\n”,$13); }’
## anno_cmd = vmstat 2 4
##
##end object
#
##
## Metric definition for kernel parameters monitoring`
##
#MetricCategory = kernel
#
#command = tail -1 /proc/sys/fs/file-nr | /bin/awk ‘{printf(“NF %.2f\n”,$2*100/$3);}’
#units = %
#anno_cmd = tail -1 /proc/sys/fs/file-nr
#
#end object
#
##
## Metric definition for user quota monitoring on all filesystem
##
#MetricCategory = quota
#
##command = repquota -v | awk ‘{ if ($4 ~ /[[:digit:]+]/) {printf(“%s %.2f\n”, $1, 100*$3/$4);} }’
#command = repquota -v -a | awk ‘{ if ($4 ~ /[[:digit:]+]/) {printf(“%s %.2f\n”, $1, 100*$3/$4);} }’
#units = %
#anno_cmd = repquota -v -a
#
#end object
#
##
## Metric definition for swap monitoring (total swap space used is monitored)
##
#MetricCategory = swap
#
#command = /usr/bin/free -b | grep Swap | awk ‘{printf(“total %.0f\n”, $3*100/$2);}’
#units = %
#anno_cmd = /usr/bin/free -b
#
#end object

UXMONperf.cfg

###############################################################################
#@(#) $Id: UXMONperf.cfg 2178 2015-05-08 03:08:02Z zhaofeif $
#@(#) $Rev: 2178 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-05-08 11:08:02 +0800 (Fri, 08 May 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################

ALARM GBL_SWAP_SPACE_UTIL > 90 FOR 10 MINUTES
START EXEC “echo ‘PERFMON: critical Message: GBL_SWAP_SPACE_USED_UTIL total exceeds 90%’ >> /var/opt/OV/log/OpC/perf_mon.log”
REPEAT EVERY 15 MINUTES
EXEC “echo ‘PERFMON: critical Message: GBL_SWAP_SPACE_USED_UTIL total exceeds 90%’ >> /var/opt/OV/log/OpC/perf_mon.log”
END EXEC ” echo ‘GBL_SWAP_SPACE_USED_UTIL > 90% ENDED’ >> /var/opt/OV/log/OpC/perf_mon.log”

ALARM GBL_CPU_TOTAL_UTIL > 98 FOR 5 MINUTES
START EXEC “echo ‘PERFMON: critical Message: GBL_CPU_TOTAL_UTIL total exceeds threshold 98%’ >> /var/opt/OV/log/OpC/perf_mon.log”
REPEAT EVERY 15 MINUTES
EXEC “echo ‘PERFMON: critical Message: GBL_CPU_TOTAL_UTIL total exceeds threshold 98%’ >> /var/opt/OV/log/OpC/perf_mon.log”
END EXEC ” echo ‘GBL_CPU_SYS_MODE_UTIL > 98% ENDED’ >> /var/opt/OV/log/OpC/perf_mon.log”

ALARM GBL_CPU_TOTAL_UTIL > 85 FOR 15 MINUTES
START EXEC “echo ‘PERFMON: warning Message: GBL_CPU_TOTAL_UTIL total exceeds threshold 85%’ >> /var/opt/OV/log/OpC/perf_mon.log”
REPEAT EVERY 30 MINUTES
EXEC “echo ‘PERFMON: warning Message: GBL_CPU_TOTAL_UTIL total exceeds threshold 85%’ >> /var/opt/OV/log/OpC/perf_mon.log”
END EXEC ” echo ‘GBL_CPU_SYS_MODE_UTIL > 85% ENDED’ >> /var/opt/OV/log/OpC/perf_mon.log”

# ALM 18812, enhancement for memory
ALARM GBL_MEM_UTIL > 98 FOR 5 MINUTES
START EXEC “echo ‘PERFMON: critical Message: GBL_MEM_UTIL total exceeds threshold 98%’ >> /var/opt/OV/log/OpC/perf_mon.log”
REPEAT EVERY 15 MINUTES
EXEC “echo ‘PERFMON: critical Message: GBL_MEM_UTIL total exceeds threshold 98%’ >> /var/opt/OV/log/OpC/perf_mon.log”
END EXEC ” echo ‘GBL_MEM_UTIL > 98% ENDED’ >> /var/opt/OV/log/OpC/perf_mon.log”

ALARM GBL_MEM_UTIL > 85 FOR 15 MINUTES
START EXEC “echo ‘PERFMON: warning Message: GBL_MEM_UTIL total exceeds threshold 85%’ >> /var/opt/OV/log/OpC/perf_mon.log”
REPEAT EVERY 30 MINUTES
EXEC “echo ‘PERFMON: warning Message: GBL_MEM_UTIL total exceeds threshold 85%’ >> /var/opt/OV/log/OpC/perf_mon.log”
END EXEC ” echo ‘GBL_MEM_UTIL > 85% ENDED’ >> /var/opt/OV/log/OpC/perf_mon.log”

ALARM GBL_MEM_PAGEOUT_RATE > 98 FOR 5 MINUTES
START EXEC “echo ‘PERFMON: critical Message: GBL_MEM_PAGEOUT_RATE total exceeds threshold 98%’ >> /var/opt/OV/log/OpC/perf_mon.log”
REPEAT EVERY 15 MINUTES
EXEC “echo ‘PERFMON: critical Message: GBL_MEM_PAGEOUT_RATE total exceeds threshold 98%’ >> /var/opt/OV/log/OpC/perf_mon.log”
END EXEC ” echo ‘GBL_MEM_PAGEOUT_RATE > 98% ENDED’ >> /var/opt/OV/log/OpC/perf_mon.log”

ALARM GBL_MEM_PAGEOUT_RATE > 85 FOR 15 MINUTES
START EXEC “echo ‘PERFMON: warning Message: GBL_MEM_PAGEOUT_RATE total exceeds threshold 85%’ >> /var/opt/OV/log/OpC/perf_mon.log”
REPEAT EVERY 30 MINUTES
EXEC “echo ‘PERFMON: warning Message: GBL_MEM_PAGEOUT_RATE total exceeds threshold 85%’ >> /var/opt/OV/log/OpC/perf_mon.log”
END EXEC ” echo ‘GBL_MEM_PAGEOUT_RATE > 85% ENDED’ >> /var/opt/OV/log/OpC/perf_mon.log”

uxmon_selfcheck.cfg

###############################################################################
#@ $Id: uxmon_selfcheck.cfg 2152 2015-03-03 09:11:56Z zhaofeif $
#@ $Rev: 2152 $
#@ $Author: zhaofeif $
#@ $Date: 2015-03-03 17:11:56 +0800 (Tue, 03 Mar 2015) $
#@ $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: uxmon_selfcheck.cfg
#
#############################################################################
#
# Syntax:
# [REARM = TRUE|FALSE]
#==============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# solution = uxmon
# [check_enable = yes|no]
# Filename threshold severity schedule group
# ===========================================================================
# /dir/file1.log 15m WARNING 0000-2400 * MYGROUP
# /dir/file2.log 1d CRITICAL 0000-2400 * OS
#
#
# check_enable
#===============
# If set check_enable to YES (or yes), selfcheck will check all the log files. otherwise it will ignore all log files
#
# DEFAULTS
#===============
# If Severity not specified is assumed warning.
# If threshold is not qualified is assumed s (seconds)
# If GROUP not specified is assumed the group NONE
# If no qualifier is explicited (> or ” is assumed
#
# THRESHOLD|MISSING
#================
#
# The INTERVAL can be given in different units.Possible units for file age are:
# (no unit) => seconds
# s => seconds
# m => minutes
# h => hours
# d => days
# CMA SUPPORT
#========================================
# You can access to the EventType and EventTypeInstance using teh [ ] brackets statements
# to set which values are to be used by the following lines
# The syntax of such line is : [ OBJECT , EventType, EventTypeInstance ]
# And the Default values for all cases are NONE
# You can state only the object : [ OBJECT ]
# You can state object and EventType leave EventTypeInstace to default value [OBJECT ,EventType ]
# But you can never use more than three fields or less than 1
# Fields to be separated by commands and blank spaces not allowed
# [ OS, OS, SapApp ]

solution = uxmon
check_enable = no
/var/opt/OV/log/OpC/df_mon.log 1d warning 0000-2400 * OS
/var/opt/OV/log/OpC/act_mon.log 1d warning 0000-2400 * OS
/var/opt/OV/log/OpC/ps_mon.log 1d warning 0000-2400 * OS

#############################################################################
# end of uxmon_selfcheck.cfg
#############################################################################

uxmonsyslog.cfg

###############################################################################
#@ $Id: uxmon_selfcheck.cfg 2152 2015-03-03 09:11:56Z zhaofeif $
#@ $Rev: 2152 $
#@ $Author: zhaofeif $
#@ $Date: 2015-03-03 17:11:56 +0800 (Tue, 03 Mar 2015) $
#@ $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: uxmon_selfcheck.cfg
#
#############################################################################
#
# Syntax:
# [REARM = TRUE|FALSE]
#==============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# solution = uxmon
# [check_enable = yes|no]
# Filename threshold severity schedule group
# ===========================================================================
# /dir/file1.log 15m WARNING 0000-2400 * MYGROUP
# /dir/file2.log 1d CRITICAL 0000-2400 * OS
#
#
# check_enable
#===============
# If set check_enable to YES (or yes), selfcheck will check all the log files. otherwise it will ignore all log files
#
# DEFAULTS
#===============
# If Severity not specified is assumed warning.
# If threshold is not qualified is assumed s (seconds)
# If GROUP not specified is assumed the group NONE
# If no qualifier is explicited (> or ” is assumed
#
# THRESHOLD|MISSING
#================
#
# The INTERVAL can be given in different units.Possible units for file age are:
# (no unit) => seconds
# s => seconds
# m => minutes
# h => hours
# d => days
# CMA SUPPORT
#========================================
# You can access to the EventType and EventTypeInstance using teh [ ] brackets statements
# to set which values are to be used by the following lines
# The syntax of such line is : [ OBJECT , EventType, EventTypeInstance ]
# And the Default values for all cases are NONE
# You can state only the object : [ OBJECT ]
# You can state object and EventType leave EventTypeInstace to default value [OBJECT ,EventType ]
# But you can never use more than three fields or less than 1
# Fields to be separated by commands and blank spaces not allowed
# [ OS, OS, SapApp ]

solution = uxmon
check_enable = no
/var/opt/OV/log/OpC/df_mon.log 1d warning 0000-2400 * OS
/var/opt/OV/log/OpC/act_mon.log 1d warning 0000-2400 * OS
/var/opt/OV/log/OpC/ps_mon.log 1d warning 0000-2400 * OS

#############################################################################
# end of uxmon_selfcheck.cfg
#############################################################################

root@cavaas05:/var/opt/OV/bin/instrumentation # cat uxmonsyslog.cfg
#!/usr/bin/perl
###############################################################################
#@(#) $Id: uxmonsyslog.cfg 2214 2015-06-09 02:20:55Z zhaofeif $
#@(#) $Rev: 2214 $
#@(#) $Author: zhaofeif $
#@(#) $Date: 2015-06-09 10:20:55 +0800 (Tue, 09 Jun 2015) $
#@(#) $LastChangedBy: zhaofeif $
###############################################################################
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
#
# REARM
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes

####################
DEFAULT_SEVERITY=WARNING
#Define default severity, can be WARNING, MINOR, MAJOR, CRITICAL
#if not defined, default severity will be WARNING

################
# ErrorClassList
# [WARNING|MINOR|MAJOR|CRITICAL] If not define, use default severity
# [H|S|O|U]:[PERM|TEMP|PERF|PEND|UNKN|INFO] can define error_class:error_type
# H -> Hardware S -> Software O -> errlogger command messages U -> Unknown
# It specfies the classes of errors that can generate an ITO message
#

ERRORCLASSLIST = H
#ERRORCLASSLIST = CRITICAL H:PERM, H:TEMP, S
#ERRORCLASSLIST = S,O,U

##################
# Identifier filter
# You can specify identifier filters, that will cause such events
# be ignored
# See errpt -t for full list
# you can set as many lines ID_FILTER as you wish, if you provide more than one ID in each line
# separate them with comma
# ID_FILTER = AA8AB241, AA8AB242
# ID_FILTER = AA8AB423

###################
# Force Include
# You can force to include specific Identifiers even when such
# belong to a class list that is not explicitely inclyded in ERRORCLASSLIST
# The syntax is like ID_FILTER
# ID_INCLUDE = [WARNING|MINOR|MAJOR|CRITICAL] AA8AB241
# If severity not define, use default severity

#ID_INCLUDE = MAJOR BFE4C025
###################
#AUTOACTION=ERROR_ID::comand
#If the command returns error it will be reported, if not it exits silently
#

vc_mon.cfg

###############################################################################
#@ $Id: vc_mon.cfg 2162 2015-03-19 08:40:05Z zhaofeif $
#@ $Rev: 2162 $
#@ $Author: zhaofeif $
#@ $Date: 2015-03-19 16:40:05 +0800 (Thu, 19 Mar 2015) $
#@ $LastChangedBy: zhaofeif $
###############################################################################

#############################################################################
# File: vc_mon.cfg
#
#############################################################################
#
# configuration for VC monitoring
#
# Description of parameters
# —————————————————————————
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
# [msg_group = ]
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# msg_group
#===============
# msg_group is the alert type you can configure which alarm will be raised. Possible
# values are:
# B -> Browser
# N -> Browser+Notification
# TT -> Browser+Trouble Ticket
# NT -> Browser+Notification+Trouble Ticket
# V -> for ACF

# RG[0]=XYZ Package name 1
# RG_NODE[0]=ABC Primary node on which the package must run
# Define as * will disable running on adoptive node check
# Define as *ALL will treat as parallel resource group
# RG_SWTCH[0]=1 Set to 1 if Package_switching should be ENABLED
# Set to 0 if Package_switching must not be ENABLED
#
# numbers in [] represent a package
#
#
#############################################################################
# example configuration:
# REARM = TRUE
# msg_group = V_ACF
# RG[0]=vc_pack_one; RG_NODE[0]=vc_node_one; RG_SWTCH[0]=1
# RG[1]=vc_pack_two; RG_NODE[1]=vc_node_two; RG_SWTCH[1]=1
# RG[2]=vc_pack_three; RG_NODE[2]=*; RG_SWTCH[2]=1
# RG[3]=vc_pack_three; RG_NODE[3]=*ALL; RG_SWTCH[3]=1
#############################################################################
# end of vc_mon.cfg
#############################################################################

vol_mon.cfg

#############################################################################
#@ $Id: vol_mon.cfg 2214 2015-06-09 02:20:55Z zhaofeif $
#@ $Rev: 2214 $
#@ $Author: zhaofeif $
#@ $Date: 2015-06-09 10:20:55 +0800 (Tue, 09 Jun 2015) $
#@ $LastChangedBy: zhaofeif $
##############################################################################
#
# Syntax:
# [REARM = TRUE|FALSE]
# [disable = yes|no]
# [interval = ]
# [Disable_Lv_No_Check = yes|no]
# [exclude_lv_no_check ]
#
# [*PACKAGE pkgname]
# [*IPISLOCAL ]
# or
# exclude []
# = hhmm-hhmm []
# = n[,] | * (is also posible a range like 1-5)
# where n represents day of a week starting with
# Sunday=0 and Saturday=6;
# * means all days
# (if you use 1,* or 3,*,4 alike is not allowed)
# exclude_from_stale_extend_check
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
#
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# If the module will allow to run after the interval minutes
#
# Disable_Lv_No_Check
#===============
# If set disable to YES (or yes), will disable Open Lv and Cur Lv equal number check
# The “exclude_lv_no_check ” syntax is used to disable Open Lv and Cur Lv equal number check of specific volume group.
#
#
#
#======================
# This configuration file for volume monitor is optional. It may be used
# to specify file systems which should be mounted but don’t appear in the
# file /etc/fstab (for HP-UX and Linux)(e.g. ServiceGuard file systems) or /etc/vfstab
# (for Solaris) and /usr/sbin/lsfs (for Aix). The configuration file
# consists of one-line entries each specifying a separate filesystem. In addition, volmon
# not only monitors if the file system is mounted to the specified mount point, but also
# checks the status of logical and physical volumes.
#
# Blank lines and comment lines beginning with a “#” are ignored. Also,
# extra fields after the filesystem entry on the same line are ignored.
# This is useful for specifying the disk space monitoring configuration file
# (df_mon.cfg) as the configuration file for volume monitor.
#
# *PACKAGE. Cluster Awareness
#=============================
# There is another directive , *PACKAGE pkgname, that can be used to inform to VOLMON that
# this resource is managed by a Cluster, and more specifically included in the package named
# pkgname
# The VOLMON , prior to test that file system, it will check if the cluster package is in fact
# running in this local node, if yes, it will process it as any other file system. If it’s not
# running in this node (therefore is running in another one), it will ignore this filesystem
# about the mounted check
#
# IPISLOCAL. IP address or FQHN
#=============================
# Check if IP address or FQHN is assigned to monitored node and if a corresponding interface is UP
#
# exclude
#========================
# The “exclude ” syntax is used to force entries in /etc/fstab (for instance in HP-UX)
# to be ignored. /etc/fstab is intended for listing permanent filesystems
# only, but in some cases temporary filesystems (not usually mounted) are
# also listed. In this case, the temporary filesystems should be listed
# here with the “exclude” option so that vol_mon will not report on them.
#
# exclude_from_stale_extend_check
#======================================================
# The “exclude_from_stale_extend_check ” syntax is used to
# ignore logical file system’s stale extend check. Once you configured the exist Logical file system

############################################################################
# For example:
# REARM = TRUE
# disable this module
# disable = yes

# the volmon will execute every 5 minutes.
# interval = 5

# exclude_from_stale_extend_check /dev/vg00/lvol1
# exclude_from_stale_extend_check /dev/vg00/lvol2

# then volmon will do not check /dev/vg00/lvol1 and /dev/vg00/lvol1 about stale extend,
# certainly there will have no alarm report about /dev/vg00/lvol1 and /dev/vg00/lvol1 stale extend.

#############################################################################
# end of vol_mon.cfg
############################################################################

UXMON: Cluster lock device not up: node02: /dev/disk/disk182(STATUS:unknown) check with cmviewcl

Node : node01.setaoffice.com
Node Type : Itanium 64/32(HTTPS)
Severity : major
OM Server Time: 2015-10-18 05:39:52
Message : UXMON: Cluster lock device not up: node02: /dev/disk/disk182(STATUS:unknown) check with cmviewcl -v -l node
Msg Group : OS
Application : sgmon
Object : cmviewcl
Event Type :
not_found

Instance Name :
not_found

Instruction : check with cmviewcl -v -l node ;

Do not close this case before it is resolved.
As long as this EWM-case is not resolved or closed, monitoring is disabled

Check HP Service Guard status

root@node01:/root # cmviewcl -v

CLUSTER STATUS
cluster_hpux up

NODE STATUS STATE
node01 up running

Cluster_Lock_LVM:
VOLUME_GROUP PHYSICAL_VOLUME STATUS
/dev/vglock /dev/disk/disk75 up

Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up LinkAgg0 lan900
PRIMARY up LinkAgg1 lan901
PRIMARY up LinkAgg2 lan902
PRIMARY up LinkAgg3 lan903

PACKAGE STATUS STATE AUTO_RUN NODE
dbciSMP up running enabled vlunx014

Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual

Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Subnet up 142.40.81.0

Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled node01 (current)
Alternate unknown node02

Other_Attributes:
ATTRIBUTE_NAME ATTRIBUTE_VALUE
Style legacy
Priority no_priority

NODE STATUS STATE
node02 down halted

Cluster_Lock_LVM:
VOLUME_GROUP PHYSICAL_VOLUME STATUS
/dev/vglock /dev/disk/disk182 unknown

Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY unknown LinkAgg0 lan900
PRIMARY unknown LinkAgg1 lan901
PRIMARY unknown LinkAgg2 lan902
PRIMARY unknown LinkAgg3 lan903

In our case, we decomissioned a cluster node so node02 is always down. We added a configuration to HPOM to disable the cluster lock check

node01:/var/opt/OV/bin/instrumentation# cp sg_mon.cfg /var/opt/OV/conf/OpC
vi /var/opt/OV/conf/OpC/sg_mon.cfg
DISABLE_CLUSTER_LOCK_CHECK = yes

ERROR: Target boot environment not identified as being Solaris 10.

I was installing the recommended patch bundle for Solaris 10 and I received an error that the server is not a Solaris 10

root@solaris10:/tmp/patch/10_Recommended # ./installpatchset —-apply-prereq —-s10patchset
ERROR: Target boot environment not identified as being Solaris 10.

root@solaris10:/tmp/patch/10_Recommended # uname -a
SunOS solaris10 5.10 Generic_147440-12 sun4v sparc SUNW,Sun-Blade-T6320

Check two packages SUNWcsr and SUNWcsu

root@solaris10:~ # pkginfo -l SUNWcsr
ERROR: information for “SUNWcsr” was not found

root@solaris10:~ # pkginfo -l SUNWcsu
PKGINST: SUNWcsu
NAME: Core Solaris, (Usr)
CATEGORY: system
ARCH: sparc
VERSION: 11.10.0,REV=2005.01.21.15.53
BASEDIR: /
VENDOR: Oracle Corporation
DESC: core software for a specific instruction-set architecture
PSTAMP: on10-patch20120109151043
INSTDATE: Nov 18 2012 01:12
HOTLINE: Please contact your local service provider
STATUS: completely installed
FILES: 1666 installed pathnames
79 shared pathnames
295 linked files
144 directories
480 executables
30 setuid/setgid executables
30370 blocks used (approx)

In this case, it was missing the file /var/sadm/SUNWcsr/pkginfo

root@solaris10:/var/sadm/pkg/SUNWcsr # ls -l
total 4
drwxr-xr-x 2 root root 512 Nov 18 2012 install
drwxr-x— 27 root root 1024 Nov 18 2012 save

root@solaris10:/var/sadm/pkg/SUNWcsr # ls -l
total 56
drwxr-xr-x 2 root root 1024 Aug 18 2012 install
-rw-r–r– 1 root root 26479 Aug 18 2012 pkginfo
drwxr-x— 49 root root 1024 Aug 18 2012 save

There is two recommendations to solve this problem:
– reinstall the server
– restore from backup

Source: installpatch reports ERROR: Target boot environment not identified as being solaris 10 (Doc ID 1511328.1)

libstdc++.so.6 is needed by hpacucli-8.70-8.0

I have a Suse 9 that is not installing the hpacucli

root@suse9:~ # rpm -ivh /tmp/hpacucli-8.70-8.0.noarch.rpm
error: Failed dependencies:
libstdc++.so.6 is needed by hpacucli-8.70-8.0
libstdc++.so.6(CXXABI_1.3) is needed by hpacucli-8.70-8.0
libstdc++.so.6(GLIBCXX_3.4) is needed by hpacucli-8.70-8.0

I checked with another server that this tool is working and in this server it is missing a package

root@suse9:~ # rpm -qa | grep libstd
libstdc++-3.3.3-43.41

root@anothersuse9:~ # rpm -qa | grep libstd
libstdc++-3.3.3-43.41
libstdc++-3.3.3-43.24
libstdc++-devel-3.3.3-43.24
compat-libstdc++-lsb-4.0.2_20050901-0.4

To solve this problem install compat-libstdc++-lsb

root@suse9:~ # rpm -ivh /tmp/compat-libstdc++-lsb-4.0.2_20050901-0.4.i586.rpm
Preparing… ########################################### [100%]
1:compat-libstdc++-lsb ########################################### [100%]

root@suse9:~ # rpm -ivh /tmp/hpacucli-8.70-8.0.noarch.rpm
Preparing… ########################################### [100%]
1:hpacucli ########################################### [100%]

UXMON: mpathb – Only one path detected, no path redundancy

Also see:
UXMON: volumegroup – Only one path detected, no path redundancy
UXMON: SY1_log2_disk_001 – Only one path detected, no path redundancy

Node : linux.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : major
OM Server Time: 2015-10-14 12:39:19
Message : UXMON: mpathb – Only one path detected, no path redundancy
Msg Group : OS
Application : mpmon
Object : mp
Event Type :
not_found

Instance Name :
not_found

Instruction : The multipathd -k”show map $device topology” command shows more details

Please check /var/opt/OV/log/OpC/mp_mon.log for more details

Checking the log file it complains about the mpathb

root@linux:~ # cat /var/opt/OV/log/OpC/mp_mon.log
Wed Oct 14 13:39:13 2015 : INFO : UXMONmpmon is running now, pid=21954
Wed Oct 14 13:39:13 2015 : Major: mpathb – Only one path detected, no path redundancy
Wed Oct 14 13:39:13 2015 : INFO : UXMONmpmon end, pid=21954
Wed Oct 14 13:56:12 2015 : INFO : UXMONmpmon is running now, pid=29130
Wed Oct 14 13:56:12 2015 : Major: mpathb – Only one path detected, no path redundancy
Wed Oct 14 13:56:12 2015 : INFO : UXMONmpmon end, pid=29130
Wed Oct 14 14:13:13 2015 : INFO : UXMONmpmon is running now, pid=36813
Wed Oct 14 14:13:13 2015 : Major: mpathb – Only one path detected, no path redundancy
Wed Oct 14 14:13:13 2015 : INFO : UXMONmpmon end, pid=36813
Wed Oct 14 14:30:13 2015 : INFO : UXMONmpmon is running now, pid=44029
Wed Oct 14 14:30:13 2015 : Major: mpathb – Only one path detected, no path redundancy
Wed Oct 14 14:30:13 2015 : INFO : UXMONmpmon end, pid=44029
Wed Oct 14 14:47:12 2015 : INFO : UXMONmpmon is running now, pid=51897
Wed Oct 14 14:47:13 2015 : INFO : UXMONmpmon end, pid=51897
Wed Oct 14 15:04:12 2015 : INFO : UXMONmpmon is running now, pid=58833
Wed Oct 14 15:04:12 2015 : INFO : UXMONmpmon end, pid=58833

In this server it is a local disk so it was added to the multipath blacklist

root@linux:~ # vi /etc/multipath.conf
blacklist {
devnode “^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*”
devnode “^hd[a-z]”
devnode “^sd[ab]$”
devnode “^cciss!c[0-9]d[0-9]*”
}

If you are in a VMware host, you can safely disable this module.

root@linux:~ # cp /var/opt/OV/bin/instrumentation/mp_mon.cfg /var/opt/OV/conf/OpC/

In the configuration file /var/opt/OV/conf/OpC/mp_mon.cfg set disable to yes

root@linux:~ # vi /var/opt/OV/conf/OpC/mp_mon.cfg
disable = yes

Solaris 10 – LUN expansion not showing the new size

I did the following: entered the format utility and chose type

root@solaris10:/ # format c8t6005076308FFC2A70000000000001143d0
selecting c8t6005076308FFC2A70000000000001143d0
[disk formatted]

FORMAT MENU:
disk – select a disk
type – select (define) a disk type
partition – select (define) a partition table
current – describe the current disk
format – format and analyze the disk
repair – repair a defective sector
label – write label to the disk
analyze – surface analysis
defect – defect list management
backup – search for backup labels
verify – read and display labels
save – save new disk/partition definitions
inquiry – show vendor, product and revision
volname – set 8-character volume name
! – execute , then return
quit
format> t

Selected 0 to Auto configure

AVAILABLE DRIVE TYPES:
0. Auto configure
1. Quantum ProDrive 80S
2. Quantum ProDrive 105S
3. CDC Wren IV 94171-344
4. SUN0104
5. SUN0207
6. SUN0327
7. SUN0340
8. SUN0424
9. SUN0535
10. SUN0669
11. SUN1.0G
12. SUN1.05
13. SUN1.3G
14. SUN2.1G
15. SUN2.9G
16. Zip 100
17. Zip 250
18. Peerless 10GB
19. IBM-2107900-.600
20. other
Specify disk type (enter its number)[19]: 0

After selecting to autoconfigure it was showing the new size and then it was labeled

c8t6005076308FFC2A70000000000001143d0: configured with capacity of 200.98GB
<IBM-2107900-.600 cyl 25726 alt 2 hd 64 sec 256>
selecting c8t6005076308FFC2A70000000000001143d0
[disk formatted]
format> l
Ready to label disk, continue? y

You can also take a look at this procedure to perform – Getting the Solaris format utility to work with an expanded LUN

VxVM vxplex ERROR V-5-1-10870 fsgen/vxplex: Warning: vxsync exited with exitcode 42

Migrating data to a new storage. Instead of using vxevac we mirrored the volumes

After receiving confirmation that everything is ok, we started removing one side of the mirror.

root@solaris:/ # vxplex -g bkpdg det dat.bkpdg-01
VxVM vxsync INFO V-5-1-4514 VX_FREEZE_ALL ioctl failed
VxVM vxplex ERROR V-5-1-10870 fsgen/vxplex: Warning: vxsync exited with exitcode 42:
Volume data may not be flushed to all plexes

Checking the disk group

root@solaris:/ # vxprint -htg bkpdg
DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME RVG KSTATE STATE NVOLUME
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
EX NAME ASSOC VC PERMS MODE STATE
SR NAME KSTATE

dg bkpdg default default 3000 1222439090.134.prd007-sbc

dm bkpdgnew01 san_vc0_13 auto 16127 104792064 –
dm bkpdgnew02 san_vc0_28 auto 3839 16766976 –
dm bkpdgnew03 san_vc0_25 auto 16127 1048510464 –
dm bkpdgnew04 san_vc0_27 auto 3839 16766976 –
dm bkpdgnew05 san_vc0_14 auto 16127 104792064 –
dm bkpdgnew06 san_vc0_8 auto 3839 62904320 –
dm bkpdgnew07 san_vc0_21 auto 16127 146735104 –
dm bkpdgnew08 san_vc0_7 auto 16127 104792064 –
dm bkpdgnew09 san_vc0_24 auto 16127 209649664 –
dm bkpdgnew10 san_vc0_4 auto 16127 104792064 –
dm bkpdgnew11 san_vc0_6 auto 16127 104792064 –
dm bkpdg01 ibm_ds8x000_1153 auto 81663 106823680 –
dm bkpdg02 ibm_ds8x000_1167 auto 66943 18800640 –
dm bkpdg03 ibm_ds8x000_1164 auto 81663 1050542080 –
dm bkpdg04 ibm_ds8x000_1166 auto 66943 18800640 –
dm bkpdg05 ibm_ds8x000_1154 auto 81663 106823680 –
dm bkpdg06 ibm_ds8x000_1148 auto 81663 64880640 –
dm bkpdg07 ibm_ds8x000_1160 auto 81663 148766720 –
dm bkpdg08 ibm_ds8x000_1147 auto 81663 106823680 –
dm bkpdg09 ibm_ds8x000_1163 auto 81663 211681280 –
dm bkpdg10 ibm_ds8x000_1144 auto 81663 106823680 –
dm bkpdg11 ibm_ds8x000_1146 auto 81663 106823680 –

v dat.bkpdg – ENABLED ACTIVE 2025293824 SELECT – fsgen
pl dat.bkpdg-01 dat.bkpdg DETACHED STALE 2025293824 CONCAT – RW
sd bkpdgnew01-01 dat.bkpdg-01 bkpdgnew01 0 104792064 0 san_vc0_13 ENA
sd bkpdgnew02-01 dat.bkpdg-01 bkpdgnew02 0 16766976 104792064 san_vc0_28 ENA
sd bkpdgnew03-01 dat.bkpdg-01 bkpdgnew03 0 1048510464 121559040 san_vc0_25 ENA
sd bkpdgnew04-01 dat.bkpdg-01 bkpdgnew04 0 16766976 1170069504 san_vc0_27 ENA
sd bkpdgnew05-01 dat.bkpdg-01 bkpdgnew05 0 104792064 1186836480 san_vc0_14 ENA
sd bkpdgnew06-01 dat.bkpdg-01 bkpdgnew06 0 62904320 1291628544 san_vc0_8 ENA
sd bkpdgnew07-01 dat.bkpdg-01 bkpdgnew07 0 146735104 1354532864 san_vc0_21 ENA
sd bkpdgnew08-01 dat.bkpdg-01 bkpdgnew08 0 104792064 1501267968 san_vc0_7 ENA
sd bkpdgnew09-01 dat.bkpdg-01 bkpdgnew09 0 209649664 1606060032 san_vc0_24 ENA
sd bkpdgnew10-01 dat.bkpdg-01 bkpdgnew10 0 104792064 1815709696 san_vc0_4 ENA
sd bkpdgnew11-01 dat.bkpdg-01 bkpdgnew11 0 104792064 1920501760 san_vc0_6 ENA
pl dat.bkpdg-02 dat.bkpdg ENABLED ACTIVE 2025293824 CONCAT – RW
sd bkpdg10-01 dat.bkpdg-02 bkpdg10 0 106823680 0 ibm_ds8x000_1144 ENA
sd bkpdg11-01 dat.bkpdg-02 bkpdg11 0 106823680 106823680 ibm_ds8x000_1146 ENA
sd bkpdg08-01 dat.bkpdg-02 bkpdg08 0 106823680 213647360 ibm_ds8x000_1147 ENA
sd bkpdg06-01 dat.bkpdg-02 bkpdg06 0 64880640 320471040 ibm_ds8x000_1148 ENA
sd bkpdg01-01 dat.bkpdg-02 bkpdg01 0 106823680 385351680 ibm_ds8x000_1153 ENA
sd bkpdg05-01 dat.bkpdg-02 bkpdg05 0 106823680 492175360 ibm_ds8x000_1154 ENA
sd bkpdg07-01 dat.bkpdg-02 bkpdg07 0 148766720 598999040 ibm_ds8x000_1160 ENA
sd bkpdg09-01 dat.bkpdg-02 bkpdg09 0 211681280 747765760 ibm_ds8x000_1163 ENA
sd bkpdg03-01 dat.bkpdg-02 bkpdg03 0 1050542080 959447040 ibm_ds8x000_1164 ENA
sd bkpdg04-01 dat.bkpdg-02 bkpdg04 0 15304704 2009989120 ibm_ds8x000_1166 ENA

We ignored the message and removed the plex. In the support page for Veritas it asks to stop the volume and after it shows that the volume is clean, you disassociate the plex. Check Veritas support page if you want to follow their instructions

root@solaris:/ # vxplex -g bkpdg -o rm dis dat.bkpdg-01

root@solaris:/ # vxprint -htg bkpdg
DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME RVG KSTATE STATE NVOLUME
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
EX NAME ASSOC VC PERMS MODE STATE
SR NAME KSTATE

dg bkpdg default default 3000 1222439090.134.prd007-sbc

dm bkpdgnew01 san_vc0_13 auto 16127 104792064 –
dm bkpdgnew02 san_vc0_28 auto 3839 16766976 –
dm bkpdgnew03 san_vc0_25 auto 16127 1048510464 –
dm bkpdgnew04 san_vc0_27 auto 3839 16766976 –
dm bkpdgnew05 san_vc0_14 auto 16127 104792064 –
dm bkpdgnew06 san_vc0_8 auto 3839 62904320 –
dm bkpdgnew07 san_vc0_21 auto 16127 146735104 –
dm bkpdgnew08 san_vc0_7 auto 16127 104792064 –
dm bkpdgnew09 san_vc0_24 auto 16127 209649664 –
dm bkpdgnew10 san_vc0_4 auto 16127 104792064 –
dm bkpdgnew11 san_vc0_6 auto 16127 104792064 –
dm bkpdg01 ibm_ds8x000_1153 auto 81663 106823680 –
dm bkpdg02 ibm_ds8x000_1167 auto 66943 18800640 –
dm bkpdg03 ibm_ds8x000_1164 auto 81663 1050542080 –
dm bkpdg04 ibm_ds8x000_1166 auto 66943 18800640 –
dm bkpdg05 ibm_ds8x000_1154 auto 81663 106823680 –
dm bkpdg06 ibm_ds8x000_1148 auto 81663 64880640 –
dm bkpdg07 ibm_ds8x000_1160 auto 81663 148766720 –
dm bkpdg08 ibm_ds8x000_1147 auto 81663 106823680 –
dm bkpdg09 ibm_ds8x000_1163 auto 81663 211681280 –
dm bkpdg10 ibm_ds8x000_1144 auto 81663 106823680 –
dm bkpdg11 ibm_ds8x000_1146 auto 81663 106823680 –

v dat.bkpdg – ENABLED ACTIVE 2025293824 SELECT – fsgen
pl dat.bkpdg-02 dat.bkpdg ENABLED ACTIVE 2025293824 CONCAT – RW
sd bkpdg10-01 dat.bkpdg-02 bkpdg10 0 106823680 0 ibm_ds8x000_1144 ENA
sd bkpdg11-01 dat.bkpdg-02 bkpdg11 0 106823680 106823680 ibm_ds8x000_1146 ENA
sd bkpdg08-01 dat.bkpdg-02 bkpdg08 0 106823680 213647360 ibm_ds8x000_1147 ENA
sd bkpdg06-01 dat.bkpdg-02 bkpdg06 0 64880640 320471040 ibm_ds8x000_1148 ENA
sd bkpdg01-01 dat.bkpdg-02 bkpdg01 0 106823680 385351680 ibm_ds8x000_1153 ENA
sd bkpdg05-01 dat.bkpdg-02 bkpdg05 0 106823680 492175360 ibm_ds8x000_1154 ENA
sd bkpdg07-01 dat.bkpdg-02 bkpdg07 0 148766720 598999040 ibm_ds8x000_1160 ENA
sd bkpdg09-01 dat.bkpdg-02 bkpdg09 0 211681280 747765760 ibm_ds8x000_1163 ENA
sd bkpdg03-01 dat.bkpdg-02 bkpdg03 0 1050542080 959447040 ibm_ds8x000_1164 ENA
sd bkpdg04-01 dat.bkpdg-02 bkpdg04 0 15304704 2009989120 ibm_ds8x000_1166 ENA

Source: VxVM vxplex ERROR V-5-1-10870 fsgen/vxplex: Warning: vxsync exited with exitcode 42: Volume data may not be flushed to all plexes” appears when attempting to dissociate a plex for a volumed

HPOM UXMON: kernel table THRESH_NP is over threshold 80%. Use sar or other available tool tocheck out.

Node : hpux.setaoffice.com
Node Type : Sun SPARC (HTTPS)
Severity : warning
OM Server Time: 2015-10-06 14:24:46
Message : UXMON: kernel table THRESH_NP is over threshold 80%. Use sar or other available tool tocheck out.
Msg Group : OS
Application : ktsmon
Object : THRESH_NP
Event Type :
not_found

Instance Name :
not_found

Instruction : A kernel table is close to be exhausted, this might impact
in the system. Please, check this and take actions if needed.

If the threshold set is too low then increase it in the kts_mon.cfg file

There were several scripts that were being executed by the user application

root@hpux:/ # ps -ef | grep respawn | wc -l
8142

I killed several of them and the load decreased.

root@hpux:/ # uptime
9:33am up 101 day(s), 10:58, 2 users, load average: 2062.32, 3526.27, 3627.53

root@hpux:/ # uptime
9:36am up 101 day(s), 11:01, 2 users, load average: 71.39, 1788.84, 2893.45

root@hpux:/ # uptime
9:38am up 101 day(s), 11:03, 3 users, load average: 13.54, 1261.00, 2575.33

root@hpux:/ # uptime
9:46am up 101 day(s), 11:11, 4 users, load average: 0.63, 254.14, 1509.83

Asked to restart the application because I didn’t know if it killed some important script or not

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: