Node : solaris.setaoffice.com
Node Type : Sun SPARC (HTTPS)
Severity : warning
OM Server Time: 2019-05-02 02:00:04
Message : UXMON: The selfclean UXMON module has failed, please check reason
Msg Group : OS
Application : uxmon
Object : selfclean
Event Type :
not_foundInstance Name :
not_foundInstruction : No
EventDataSource :
Category: HPOM
HPOM – Checking fmadm faulty and there is a pool without name
Node : sc02-app03.setaoffice.com
Node Type : Sun SPARC (HTTPS)
Severity : major
OM Server Time: 2018-02-06 11:11:01
Message : UXMON: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major
Msg Group : OS
Application : SOL_mon
Object : FMT
Event Type :
not_foundInstance Name :
not_foundInstruction : “The Fault Management agent has identified a HW or OS related problem with the severity presented by the ticket.
The problem(s) can be viewed and managed with the command – fmdump
To get a better understanding of the problem and on how to resolve it, locate the event that generated
the ticket in the syslog file /var/adm/messages, a URL will be found (http://sun.com/msg/xxx-nnnn-yy),
follow the link using your Oracle portal account for instructions.”
EventDataSource :
Checking fmadm faulty I see a pool without a name
root@sc02-app03:~# fmadm faulty
————— ———————————— ————– ———
TIME EVENT-ID MSG-ID SEVERITY
————— ———————————— ————– ———
Feb 06 15:59:15 4577aa7a-2b00-6eb6-f139-bf2848542fb2 ZFS-8000-HC MajorProblem Status : open
Diag Engine : zfs-diagnosis / 1.0
System
Manufacturer : unknown
Name : –
Part_Number : unknown
Serial_Number : unknownSystem Component
Manufacturer : Oracle-Corporation
Name : ORCL,SPARC-T5-8
Part_Number : unknown
Serial_Number : unknown
Host_ID : (null)—————————————-
Suspect 1 of 1 :
Problem class : fault.fs.zfs.io_failure_wait
Certainty : 100%
Affects : zfs://pool=9fbf9b5d11236d0a
Status : faulted but still in serviceResource
FMRI : “zfs://pool=9fbf9b5d11236d0a”
Status : faulted but still in serviceDescription : ZFS pool ” has experienced currently unrecoverable I/O failures.
Response : No automated response will occur.
Impact : Read and write I/Os cannot be serviced.
Action : Use ‘fmadm faulty’ to provide a more detailed view of this event.
Make sure the affected devices are connected, then run ‘zpool
clear’. Please refer to the associated reference document at
http://support.oracle.com/msg/ZFS-8000-HC for the latest service
procedures and policies regarding this diagnosis.
To solve this problem, you need to reboot the server
HPOM – Can’t open perl script “/var/opt/OV/bin/instrumentation/UXMONperfmon”: No such file or directory
root@linux:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -help
GD UXMON monitoring package
Broker utilityusage: /var/opt/OV/bin/instrumentation/UXMONbroker [-h | –help] [-x ] [ -d ] [-l ] [-c ] [-f]
[ –col -p param ]
[ -t ] [ -p ] [-b ]
[ –os ]
[ -v ]-h : this (help) message
–help : this (help) message-x : triggers the execution of the module passed as parameter
-d : Allows execution activating debug
-check : Same as -x but output is redirected standard output, no logfile used
-l : output the logfile used by the module passed as parameter
-c : output the preferred config file used by the module-t : output the TEMPORAL folder to be used if needed
-b : output the folder where the commands or instrumentation are located
-perl : output the perl runtime to be used–col : Execute the collecting information of module
-p : Parameter passed to the recollection
–os : Show the OS name-v : Version of UXMON package
-f : force the execution of the module bypass interval setting
supported modules are:
actmon, sshdmon, tdfmon, uxmon, nfsmon, selfcheck, swapmon, evm, mpmon, mdmon, cronmon, bondmon, rcmon, volmon, scmon, loopmon, dmesg, advfsmon, ntpmon, hwmon, bootmon, nicmon, perfmon, psmon, lpmon, vcmon, ktsmon, sgmon, dfmon.
This is the interface to the OVO templates. Templates will call this command
to get executed the different modules available, or retrieve configuration
information about the UXMON and the platform
root@linux:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check perfmon
Can’t open perl script “/var/opt/OV/bin/instrumentation/UXMONperfmon”: No such file or directory
root@linux:~ # ls -l /var/opt/OV/log/OpC/perf_mon.log
-rw-rw-r– 1 root bin 0 Feb 1 23:00 /var/opt/OV/log/OpC/perf_mon.log
HPOM (xpl-394) semget operation failed.
[11/26/16 00:28:51] [HPOvXpl] [configure] [INFO] Creating directories under OvDataDir
[11/26/16 00:28:51] [HPOvXpl] [configure] [INFO] Directory creation under OvDataDir completed.
[11/26/16 00:28:51] [HPOvXpl] [configure] [INFO] Creating configuration files.
[11/26/16 00:28:51] [HPOvXpl] [configure] [INFO] Updating XPL configuration.
(xpl-394) semget operation failed.
(RTL-28) No space left on device
[11/26/16 00:28:51] [HPOvXpl] [configure] [ERROR] Unable to update XPL configuration.
[11/26/16 00:28:51] [HPOvXpl] [configure] [INFO] Updating component matrix.
[11/26/16 00:28:51] [HPOvXpl] [configure] [INFO] Changing the group ownership of the files if the group is defined in XPL.
[11/26/16 00:28:52] [HPOvXpl] [configure] [INFO] Xpl configuration DONE
root@linux:~ # sysctl kernel.sem
kernel.sem = 250 32000 100 128root@linux:~ # sysctl -w kernel.sem=”250 32000 100 256″
kernel.sem = 250 32000 100 256
root@linux:~ # ipcs -l
—— Semaphore Limits ——–
max number of arrays = 256
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 100
semaphore max value = 32767root@linux:~ # ipcs -u
—— Semaphore Status ——–
used arrays = 128
allocated semaphores = 3500
UXMONhwmon: severity=major Physical Drive Failed
I have a HP Proliant DL380 G7 where I know that there is a physical disk with problems but HPOM is not alarming.
Checking HPOM hwmon module
root@linux:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check hwmon
Mon Oct 3 09:37:34 2016 : INFO : HPASM not installed, exit
To have it enabled and working, you need hpasmcli installed. It is located on /sbin/hpasmcli
Red Hat Enterprise Linux 5
root@rhel5:~ # which hpasmcli
/sbin/hpasmcliroot@rhel5:~ # rpm -qf /sbin/hpasmcli
hp-health-9.30-1564.32.rhel5root@rhel5:~ # cat /etc/*release
Red Hat Enterprise Linux Server release 5.11 (Tikanga)
RedHat Enterprise Linux 6
root@rhel6:~ # which hpasmcli
/sbin/hpasmcliroot@rhel6:~ # rpm -qf /sbin/hpasmcli
hp-health-10.20-1723.28.rhel6.x86_64root@rhel6:~ # cat /etc/*release
LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Red Hat Enterprise Linux Server release 6.7 (Santiago)
Red Hat Enterprise Linux Server release 6.7 (Santiago)
HP System Health Application and Insight Management Agents for Red Hat Enterprise Linux 5 (AMD64/EM64T)
HPE System Health Application and Command Line Utilities for Red Hat Enterprise Linux 6 (AMD64/EM64T)
After installing hp-health package, HPOM hwmon module will work properly
root@linux:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check hwmon
Mon Oct 3 11:06:34 2016 : INFO : UXMONhwmon is running now, pid=20711
Mon Oct 3 11:06:43 2016 : UXMONhwmon: severity=major Physical Drive Failed
mv: `/dev/null’ and `/dev/null’ are the same file
Mon Oct 3 11:06:43 2016 : INFO : UXMONhwmon end, pid=20711
Arrange a hardware replacement for the failed disk
UXMON: SSHD Daemon is not running or not doing it properly, please check
Node : solaris.setaoffice.com
Node Type : Sun SPARC (HTTPS)
Severity : normal
OM Server Time: 2016-09-10 08:03:10
Message : UXMON: SSHD Daemon is not running or not doing it properly, please check
Msg Group : OS
Application : sshd_mon
Object : sshd
Event Type :
not_foundInstance Name :
not_foundInstruction : It has been detected an SSH installation but the SSHD is not running
Please check SSH status, because it might happen also there are still some ssh spawned processes running but the father has died.Note that if the SSH is not available this might prevent users log in the server and even impact some applications.
HPOM is complaining that ssh is not running but obviously is running because you’re connected to the server using ssh
root@solaris:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check sshdmon
Fri Sep 23 11:45:46 2016 : INFO : UXMONsshdmon is running now, pid=2250
Fri Sep 23 11:45:46 2016 : SSHDMON: SSHD – Not running
mv: /dev/null and /dev/null are identical
Fri Sep 23 11:45:46 2016 : INFO : UXMONsshdmon end, pid=2250
Check directory /var/run
root@solaris:/var/run # ls -la
total 16
drwxr-xr-x 4 root other 5 Sep 23 11:50 .
drwxr-xr-x 44 root sys 50 Aug 16 11:04 ..
-rw——- 1 root root 6 Jul 7 11:24 ds_agent.pid
drwxr-xr-x 13 root root 13 Aug 10 15:33 install_engine
drwx–x–x 2 root sys 2 Jul 6 14:27 sudo
It should have many files in /var/run
root@solaris:/var/run # ls -l
total 272
-rw——- 1 root root 0 Sep 10 21:27 AdDrEm.lck
drwxr-xr-x 3 root sys 183 Sep 10 21:43 cacao
-rw-rw-rw- 1 root bin 14 Sep 23 09:20 cdrom_rcm.conf
drwxr-xr-x 2 daemon daemon 183 Sep 23 12:18 daemon
-rw-r—– 1 root root 6 Sep 23 10:41 did_reloader.lock
-rw——- 1 root root 5 Sep 10 21:27 ds_agent.pid
Drw-r—– 1 root root 0 Sep 10 21:28 event_listener_proxy_door
Drw-r–r– 1 root root 0 Sep 10 21:40 fed_doorglobal
Drw-r–r– 1 root root 0 Sep 10 21:27 hotplugd_door
Drw-r–r– 1 root root 0 Sep 10 21:28 ifconfig_proxy_doorglobal
-rw——- 1 root root 0 Sep 10 21:26 ipsecconf.lock
Dr–r–r– 1 daemon daemon 0 Sep 10 21:26 kcfd_door
-rw——- 1 root root 0 Sep 14 09:07 lockf_raidctl
Dr–r–r– 1 root root 0 Sep 10 21:26 name_service_door
-rw-r–r– 1 root root 8 Sep 10 21:40 nfs4_domain
drwxr-xr-x 2 root root 179 Sep 10 21:40 pcmcia
Dr–r–r– 1 root root 0 Sep 10 21:26 picld_door
Drw-r–r– 1 root root 0 Sep 10 21:30 pmfd_doorglobal
-rw-r–r– 1 root sys 58 Sep 10 21:30 psn
Dr——– 1 root root 0 Sep 10 21:26 rcm_daemon_door
-rw-r–r– 1 root root 0 Sep 10 21:26 rcm_daemon_lock
-rw——- 1 root root 1068 Sep 10 21:26 rcm_daemon_state
Drw-r–r– 1 root root 0 Sep 10 21:40 rgmd_receptionist_doorglobal
drwxrwxrwt 2 root root 186 Sep 10 21:27 rpc_door
drwx—— 2 root root 182 Sep 10 21:27 smc898
-rw-r–r– 1 root root 5 Sep 10 21:27 sshd.pid
drwx–x–x 3 root sys 176 Sep 10 21:31 sudo
drwxr-xr-x 3 root root 191 Sep 10 21:26 sysevent_channels
Drw-r–r– 1 root root 0 Sep 10 21:30 sysevent_proxy_doorglobal
-rw-r–r– 1 root root 5 Sep 10 21:27 syslog.pid
Drw-r–r– 1 root root 0 Sep 10 21:27 syslog_door
-rw-r–r– 1 root root 8192 Sep 10 21:26 tzsync
drwx—— 2 root root 2625 Sep 23 10:26 zones
Drw-r–r– 1 root root 0 Sep 10 21:30 zoneup_doorglobal
Fixing the issue for the ticket. Check ssh processes for root user
root@solaris:/var/run # ps -ef | grep ssh | grep root
root 8047 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 17380 13924 0 00:07:02 ? 0:00 /usr/lib/ssh/sshd
root 5570 13878 0 08:08:40 ? 0:00 /usr/lib/ssh/sshd
root 13877 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 1003 13878 0 09:17:01 ? 0:00 /usr/lib/ssh/sshd
root 13903 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 60966 13918 0 00:03:07 ? 0:00 /usr/lib/ssh/sshd
root 48654 13878 0 10:13:22 ? 0:00 /usr/lib/ssh/sshd
root 13918 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 17389 13924 0 00:07:02 ? 0:00 /usr/lib/ssh/sshd
root 39554 13878 0 09:21:51 ? 0:00 /usr/lib/ssh/sshd
root 64681 1 0 11:25:02 ? 0:00 /usr/lib/ssh/sshd
root 11912 13878 0 09:29:14 ? 0:00 /usr/lib/ssh/sshd
root 56172 13878 0 11:54:55 ? 0:00 /usr/lib/ssh/sshd
root 17386 13924 0 00:07:02 ? 0:00 /usr/lib/ssh/sshd
root 34708 13878 0 08:51:07 ? 0:00 /usr/lib/ssh/sshd
root 60201 13878 0 09:27:36 ? 0:00 /usr/lib/ssh/sshd
root 55272 1 0 11:54:33 ? 0:00 /usr/lib/ssh/sshd
root 5850 13878 0 08:08:47 ? 0:00 /usr/lib/ssh/sshd
root 9865 44290 0 11:56:17 pts/4 0:00 grep ssh
root 13924 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 13878 1 0 Sep 21 ? 0:01 /usr/lib/ssh/sshd
Creating file /var/run/sshd.pid with sshd PID
echo 8047 > /var/run/sshd.pid
root@solaris:/var/run # ls -l sshd.pid
-rw-r–r– 1 root root 5 Sep 10 21:27 sshd.pid
sshdmon does not complain anymore
root@solaris:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check sshdmon
Fri Sep 23 11:58:15 2016 : INFO : UXMONsshdmon is running now, pid=18095
mv: /dev/null and /dev/null are identical
Fri Sep 23 11:58:15 2016 : INFO : UXMONsshdmon end, pid=18095
Check file /etc/ssh/sshd_config for the parameter PidFile if it is not creating the file on /var/run
PidFile /var/run/sshd.pid
Restart the sshd daemon
UXMON:bond1.1504 is down – Network Bonding Interface is alarming that is down but it is active in the system
Node : linux.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : critical
OM Server Time: 2016-09-27 18:35:55
Message : UXMON:bond1.1504 is down
Msg Group : OS
Application : bondmon
Object : bond
Event Type :
not_foundInstance Name :
not_foundInstruction : The ‘cat /sys/class/net/$bond/bonding/mii_status’ command shows the detail status
Please check /var/opt/OV/log/OpC/bond_mon.log for more details
The module bondmon is complaining about a network bonding interface down
root@linux:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check bondmon
Wed Sep 28 08:15:43 2016 : INFO : UXMONbondmon is running now, pid=31311
Wed Sep 28 08:15:43 2016 : Critical: bond1.1504 is down
mv: `/dev/null’ and `/dev/null’ are the same file
Wed Sep 28 08:15:43 2016 : INFO : UXMONbondmon end, pid=31311
It was showing bond1.1504 in the network bonding interfaces and no interfaces available.
root@linux:~ # ls -l /proc/net/bonding
total 0
-r–r–r– 1 root root 0 Sep 28 08:48 bond0
-r–r–r– 1 root root 0 Sep 28 08:48 bond1
-r–r–r– 1 root root 0 Sep 28 08:48 bond1.1504
-r–r–r– 1 root root 0 Sep 28 08:48 bond2
Removed bond1.1504
root@linux:~ # echo “-bond1.1504” > /sys/class/net/bonding_masters
root@linux:~ # ls -l /proc/net/bonding
total 0
-r–r–r– 1 root root 0 Sep 28 08:59 bond0
-r–r–r– 1 root root 0 Sep 28 08:59 bond1
-r–r–r– 1 root root 0 Sep 28 08:59 bond2
The configuration file for bond1.1504 was missing the parameter VLAN=yes. So added the paramter
root@linux:~ # cat /etc/sysconfig/network-scripts/ifcfg-bond1.1504
DEVICE=bond1.1504
BOOTPROT=none
ONBOOT=yes
IPADDRES=10.32.28.175
NETMASK=255.255.255.0
BONDING_OPTS=”miimon=1000 mode=active-backup”root@linux:~ # cat /etc/sysconfig/network-scripts/ifcfg-bond0.1504
DEVICE=bond0.1504
BOOTPROT=none
ONBOOT=yes
IPADDR=10.32.17.87
NETMASK=255.255.254.0
BONDING_OPTS=”miimon=1000 mode=active-backup”
VLAN=yes
Bring the network interface up
root@linux:~ # ifup ifcfg-bond1.1504
And configured the IP and netmask shown on the configuration file
root@linux:~ # ifconfig bond1.1504 10.32.28.175 netmask 255.255.255.0
root@linux:~ # ifconfig bond1.1504
bond1.1504 Link encap:Ethernet HWaddr 6C:C2:17:30:88:88
inet addr:10.32.28.175 Bcast:10.32.28.255 Mask:255.255.255.0
inet6 addr: fe80::6ec2:17ff:fe30:8888/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1034 errors:0 dropped:0 overruns:0 frame:0
TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:78494 (76.6 KiB) TX bytes:468 (468.0 b)
Running UXMONbroker with the module bondmon
root@linux:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check bondmon
Wed Sep 28 09:15:19 2016 : INFO : UXMONbondmon is running now, pid=25212
mv: `/dev/null’ and `/dev/null’ are the same file
Wed Sep 28 09:15:19 2016 : INFO : UXMONbondmon end, pid=25212
UXMON: File /var/log/cron age exceeds 3d threshold on linux.setaoffice.com
ATTENTION, RMC LEVEL 1 AGENT: This ticket will be automatically worked by the Automation Bus. Pls. ensure your Ticket List/View includes the “Assignee” column, monitor this ticket until the user “ABOPERATOR” is no longer assigned, BEFORE you start work on this ticket.
Node : linux.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : warning
OM Server Time: 2016-09-09 14:51:09
Message : UXMON: File /var/log/cron age exceeds 3d threshold.
Msg Group : OS
Application : actmon
Object : LINUX
Event Type : NONE
Instance Name : NONE
Instruction : No
This is a Suse Linux server
root@linux:~ # cat /etc/*release
SUSE Linux Enterprise Server 10 (x86_64)
VERSION = 10
PATCHLEVEL = 4
LSB_VERSION=”core-2.0-noarch:core-3.0-noarch:core-2.0-x86_64:core-3.0-x86_64″
Suse Linux cron log is located on /var/log/messages.
Comment the line /var/log/cron on configuration file
/var/opt/OV/conf/OpC/act_mon.cfg
[LINUX]
#/var/log/cron 3d WARNING 0000-2400 * TT_LINUX
UXMON: Too many instances: ntpd
ATTENTION, RMC LEVEL 1 AGENT: This ticket will be automatically worked by the Automation Bus. Pls. ensure your Ticket List/View includes the “Assignee” column, monitor this ticket until the user “ABOPERATOR” is no longer assigned, BEFORE you start work on this ticket.
Node : linux.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : warning
OM Server Time: 2016-09-11 02:03:20
Message : UXMON: Too many instances: ntpd . ARGS: -p /var/run/ntp/ntpd.pid -x -g -u ntp:ntp -i /var/lib/ntp -c /etc/ntp.conf
Msg Group : OS
Application : psmon
Object : ntpd
Event Type : NONE
Instance Name : NONE
Instruction : No
When there are two ntp processes, review your /etc/ntp.conf file. There must be an unreachable NTP server
root@linux:~ # ps -ef | grep ntp
root 24639 24476 0 10:22 pts/0 00:00:00 grep ntp
ntp 26152 1 0 Sep11 ? 00:00:06 /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -x -g -u ntp:ntp -i /var/lib/ntp -c /etc/ntp.conf
root 26277 26152 0 Sep11 ? 00:00:00 /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -x -g -u ntp:ntp -i /var/lib/ntp -c /etc/ntp.conf
Review your /etc/ntp.conf file and restart NTP
root@linux:~ # service ntp stop
Shutting down network time protocol daemon (NTPD) doneroot@linux:~ # ps -ef | grep ntp
root 24778 24476 0 10:23 pts/0 00:00:00 grep ntproot@linux:~ # service ntp start
Starting network time protocol daemon (NTPD) doneroot@linux:~ # ps -ef | grep ntp
root 24819 1 0 10:23 ? 00:00:00 /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -x -g -u ntp:ntp -i /var/lib/ntp -c /etc/ntp.conf
root 24824 24476 0 10:23 pts/0 00:00:00 grep ntp
UXMON: Power Supply Error
Node : linux.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : major
OM Server Time: 2016-08-25 16:40:25
Message : UXMON: Power Supply Error
Msg Group : OS
Application : hwmon
Object : hardware
Event Type :
not_foundInstance Name :
not_foundInstruction : No
I have a HP Proliant DL580 Gen9 where I received a ticket about a power supply problem.
Check with hpasmcli
root@linux:~ # rpm -qf /sbin/hpasmcli
hp-health-10.20-1723.26.sles11root@linux:~ # hpasmcli -s “show powersupply”
Power supply #1
Present : Yes
Redundant: Yes
Condition: Ok
Hotplug : Supported
Power : 235 Watts
Power supply #2
Present : Yes
Redundant: Yes
Condition: DEGRADED
Hotplug : Supported
Power supply #3
Present : Yes
Redundant: Yes
Condition: Ok
Hotplug : Supported
Power : 235 Watts
Power supply #4
Present : Yes
Redundant: Yes
Condition: Ok
Hotplug : Supported
Power : 205 Watts
Schedule a replacement with HP
You must be logged in to post a comment.