Advertisements

Tag Archives: UXMONbroker

UXMON: slave eno49 of bonding device bond0 in Red Hat Enterprise Linux 7

ATTENTION, RMC LEVEL 1 AGENT: This ticket will be automatically worked by the Automation Bus. Pls. ensure your Ticket List/View includes the “Assignee” column, monitor this ticket until the user “ABOPERATOR” is no longer assigned, BEFORE you start work on this ticket.
Node : linux.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : major
OM Server Time: 2017-03-07 09:50:03
Message : UXMON: slave eno49 of bonding device bond0
Msg Group : OS
Application : bondmon
Object : bond
Event Type :
not_found

Instance Name :
not_found

Instruction : The ‘cat /sys/class/net/bondX/slave_ethY/operstate’ command shows detail

Please check /var/opt/OV/log/OpC/bond_mon.log for more details

Running UXMONbroker we see that there are some files that don’t exist

root@linux:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check bondmon
Mon Mar 13 15:47:35 2017 : INFO : UXMONbondmon is running now, pid=17458
cat: /sys/class/net/bond0/slave_eno49/operstate: No such file or directory
Mon Mar 13 15:47:35 2017 : Major: slave eno49 of bonding device bond0
cat: /sys/class/net/bond0/slave_eno50/operstate: No such file or directory
Mon Mar 13 15:47:35 2017 : Major: slave eno50 of bonding device bond0
mv: ‘/dev/null’ and ‘/dev/null’ are the same file
Mon Mar 13 15:47:35 2017 : INFO : UXMONbondmon end, pid=17458

root@linux:~ # ls -l /sys/class/net/bond0/
total 0
-r–r–r– 1 root root 4096 Mar 2 13:35 addr_assign_type
-r–r–r– 1 root root 4096 Mar 7 17:38 address
-r–r–r– 1 root root 4096 Mar 7 17:38 addr_len
drwxr-xr-x 2 root root 0 Mar 2 13:35 bonding
-r–r–r– 1 root root 4096 Mar 7 17:38 broadcast
-rw-r–r– 1 root root 4096 Mar 7 17:38 carrier
-r–r–r– 1 root root 4096 Mar 7 17:38 carrier_changes
-r–r–r– 1 root root 4096 Mar 2 13:35 dev_id
-r–r–r– 1 root root 4096 Mar 7 17:38 dev_port
-r–r–r– 1 root root 4096 Mar 7 17:38 dormant
-r–r–r– 1 root root 4096 Mar 7 17:38 duplex
-rw-r–r– 1 root root 4096 Mar 7 17:38 flags
-rw-r–r– 1 root root 4096 Mar 7 17:38 gro_flush_timeout
-rw-r–r– 1 root root 4096 Mar 7 17:38 ifalias
-r–r–r– 1 root root 4096 Mar 2 13:35 ifindex
-r–r–r– 1 root root 4096 Mar 2 13:35 iflink
-r–r–r– 1 root root 4096 Mar 7 17:38 link_mode
lrwxrwxrwx 1 root root 0 Mar 7 17:38 lower_eno49 -> ../../../pci0000:00/0000:00:02.0/0000:06:00.0/net/eno49
lrwxrwxrwx 1 root root 0 Mar 7 17:38 lower_eno50 -> ../../../pci0000:00/0000:00:02.0/0000:06:00.1/net/eno50
-rw-r–r– 1 root root 4096 Mar 7 17:38 mtu
-rw-r–r– 1 root root 4096 Mar 7 17:38 netdev_group
-r–r–r– 1 root root 4096 Mar 7 17:38 operstate
-r–r–r– 1 root root 4096 Mar 2 13:35 phys_port_id
drwxr-xr-x 2 root root 0 Mar 2 13:41 power
drwxr-xr-x 34 root root 0 Mar 2 13:35 queues
-r–r–r– 1 root root 4096 Mar 7 17:38 speed
drwxr-xr-x 2 root root 0 Mar 2 13:41 statistics
lrwxrwxrwx 1 root root 0 Mar 2 13:35 subsystem -> ../../../../class/net
-rw-r–r– 1 root root 4096 Mar 7 17:38 tx_queue_len
-r–r–r– 1 root root 4096 Mar 2 13:35 type
-rw-r–r– 1 root root 4096 Mar 2 13:35 uevent

This is a Red Hat Enterprise Linux 7. Current HPOM have a bug that is querying the wrong file

root@linux:~ # cat /etc/*release
NAME=”Red Hat Enterprise Linux Server”
VERSION=”7.3 (Maipo)”
ID=”rhel”
ID_LIKE=”fedora”
VERSION_ID=”7.3″
PRETTY_NAME=”Red Hat Enterprise Linux”
ANSI_COLOR=”0;31″
CPE_NAME=”cpe:/o:redhat:enterprise_linux:7.3:GA:server”
HOME_URL=”https://www.redhat.com/”
BUG_REPORT_URL=”https://bugzilla.redhat.com/”

REDHAT_BUGZILLA_PRODUCT=”Red Hat Enterprise Linux 7″
REDHAT_BUGZILLA_PRODUCT_VERSION=7.3
REDHAT_SUPPORT_PRODUCT=”Red Hat Enterprise Linux”
REDHAT_SUPPORT_PRODUCT_VERSION=”7.3″
Red Hat Enterprise Linux Server release 7.3 (Maipo)
Red Hat Enterprise Linux Server release 7.3 (Maipo)

Copy the bond_mon.cfg configuration file and disable this module

cp -p /var/opt/OV/bin/instrumentation/bond_mon.cfg /var/opt/OV/conf/OpC
vi /var/opt/OV/conf/OpC/bond_mon.cfg
disable = yes

root@linux:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check bondmon
root@linux:~ #

Advertisements

UXMONhwmon: severity=major Physical Drive Failed

I have a HP Proliant DL380 G7 where I know that there is a physical disk with problems but HPOM is not alarming.

Checking HPOM hwmon module

root@linux:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check hwmon
Mon Oct 3 09:37:34 2016 : INFO : HPASM not installed, exit

To have it enabled and working, you need hpasmcli installed. It is located on /sbin/hpasmcli

Red Hat Enterprise Linux 5

root@rhel5:~ # which hpasmcli
/sbin/hpasmcli

root@rhel5:~ # rpm -qf /sbin/hpasmcli
hp-health-9.30-1564.32.rhel5

root@rhel5:~ # cat /etc/*release
Red Hat Enterprise Linux Server release 5.11 (Tikanga)

RedHat Enterprise Linux 6

root@rhel6:~ # which hpasmcli
/sbin/hpasmcli

root@rhel6:~ # rpm -qf /sbin/hpasmcli
hp-health-10.20-1723.28.rhel6.x86_64

root@rhel6:~ # cat /etc/*release
LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Red Hat Enterprise Linux Server release 6.7 (Santiago)
Red Hat Enterprise Linux Server release 6.7 (Santiago)

HP System Health Application and Insight Management Agents for Red Hat Enterprise Linux 5 (AMD64/EM64T)
HPE System Health Application and Command Line Utilities for Red Hat Enterprise Linux 6 (AMD64/EM64T)

After installing hp-health package, HPOM hwmon module will work properly

root@linux:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check hwmon
Mon Oct 3 11:06:34 2016 : INFO : UXMONhwmon is running now, pid=20711
Mon Oct 3 11:06:43 2016 : UXMONhwmon: severity=major Physical Drive Failed
mv: `/dev/null’ and `/dev/null’ are the same file
Mon Oct 3 11:06:43 2016 : INFO : UXMONhwmon end, pid=20711

Arrange a hardware replacement for the failed disk

UXMON: SSHD Daemon is not running or not doing it properly, please check

Node : solaris.setaoffice.com
Node Type : Sun SPARC (HTTPS)
Severity : normal
OM Server Time: 2016-09-10 08:03:10
Message : UXMON: SSHD Daemon is not running or not doing it properly, please check
Msg Group : OS
Application : sshd_mon
Object : sshd
Event Type :
not_found

Instance Name :
not_found

Instruction : It has been detected an SSH installation but the SSHD is not running
Please check SSH status, because it might happen also there are still some ssh spawned processes running but the father has died.

Note that if the SSH is not available this might prevent users log in the server and even impact some applications.

HPOM is complaining that ssh is not running but obviously is running because you’re connected to the server using ssh

root@solaris:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check sshdmon
Fri Sep 23 11:45:46 2016 : INFO : UXMONsshdmon is running now, pid=2250
Fri Sep 23 11:45:46 2016 : SSHDMON: SSHD – Not running
mv: /dev/null and /dev/null are identical
Fri Sep 23 11:45:46 2016 : INFO : UXMONsshdmon end, pid=2250

Check directory /var/run

root@solaris:/var/run # ls -la
total 16
drwxr-xr-x 4 root other 5 Sep 23 11:50 .
drwxr-xr-x 44 root sys 50 Aug 16 11:04 ..
-rw——- 1 root root 6 Jul 7 11:24 ds_agent.pid
drwxr-xr-x 13 root root 13 Aug 10 15:33 install_engine
drwx–x–x 2 root sys 2 Jul 6 14:27 sudo

It should have many files in /var/run

root@solaris:/var/run # ls -l
total 272
-rw——- 1 root root 0 Sep 10 21:27 AdDrEm.lck
drwxr-xr-x 3 root sys 183 Sep 10 21:43 cacao
-rw-rw-rw- 1 root bin 14 Sep 23 09:20 cdrom_rcm.conf
drwxr-xr-x 2 daemon daemon 183 Sep 23 12:18 daemon
-rw-r—– 1 root root 6 Sep 23 10:41 did_reloader.lock
-rw——- 1 root root 5 Sep 10 21:27 ds_agent.pid
Drw-r—– 1 root root 0 Sep 10 21:28 event_listener_proxy_door
Drw-r–r– 1 root root 0 Sep 10 21:40 fed_doorglobal
Drw-r–r– 1 root root 0 Sep 10 21:27 hotplugd_door
Drw-r–r– 1 root root 0 Sep 10 21:28 ifconfig_proxy_doorglobal
-rw——- 1 root root 0 Sep 10 21:26 ipsecconf.lock
Dr–r–r– 1 daemon daemon 0 Sep 10 21:26 kcfd_door
-rw——- 1 root root 0 Sep 14 09:07 lockf_raidctl
Dr–r–r– 1 root root 0 Sep 10 21:26 name_service_door
-rw-r–r– 1 root root 8 Sep 10 21:40 nfs4_domain
drwxr-xr-x 2 root root 179 Sep 10 21:40 pcmcia
Dr–r–r– 1 root root 0 Sep 10 21:26 picld_door
Drw-r–r– 1 root root 0 Sep 10 21:30 pmfd_doorglobal
-rw-r–r– 1 root sys 58 Sep 10 21:30 psn
Dr——– 1 root root 0 Sep 10 21:26 rcm_daemon_door
-rw-r–r– 1 root root 0 Sep 10 21:26 rcm_daemon_lock
-rw——- 1 root root 1068 Sep 10 21:26 rcm_daemon_state
Drw-r–r– 1 root root 0 Sep 10 21:40 rgmd_receptionist_doorglobal
drwxrwxrwt 2 root root 186 Sep 10 21:27 rpc_door
drwx—— 2 root root 182 Sep 10 21:27 smc898
-rw-r–r– 1 root root 5 Sep 10 21:27 sshd.pid
drwx–x–x 3 root sys 176 Sep 10 21:31 sudo
drwxr-xr-x 3 root root 191 Sep 10 21:26 sysevent_channels
Drw-r–r– 1 root root 0 Sep 10 21:30 sysevent_proxy_doorglobal
-rw-r–r– 1 root root 5 Sep 10 21:27 syslog.pid
Drw-r–r– 1 root root 0 Sep 10 21:27 syslog_door
-rw-r–r– 1 root root 8192 Sep 10 21:26 tzsync
drwx—— 2 root root 2625 Sep 23 10:26 zones
Drw-r–r– 1 root root 0 Sep 10 21:30 zoneup_doorglobal

Fixing the issue for the ticket. Check ssh processes for root user

root@solaris:/var/run # ps -ef | grep ssh | grep root
root 8047 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 17380 13924 0 00:07:02 ? 0:00 /usr/lib/ssh/sshd
root 5570 13878 0 08:08:40 ? 0:00 /usr/lib/ssh/sshd
root 13877 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 1003 13878 0 09:17:01 ? 0:00 /usr/lib/ssh/sshd
root 13903 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 60966 13918 0 00:03:07 ? 0:00 /usr/lib/ssh/sshd
root 48654 13878 0 10:13:22 ? 0:00 /usr/lib/ssh/sshd
root 13918 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 17389 13924 0 00:07:02 ? 0:00 /usr/lib/ssh/sshd
root 39554 13878 0 09:21:51 ? 0:00 /usr/lib/ssh/sshd
root 64681 1 0 11:25:02 ? 0:00 /usr/lib/ssh/sshd
root 11912 13878 0 09:29:14 ? 0:00 /usr/lib/ssh/sshd
root 56172 13878 0 11:54:55 ? 0:00 /usr/lib/ssh/sshd
root 17386 13924 0 00:07:02 ? 0:00 /usr/lib/ssh/sshd
root 34708 13878 0 08:51:07 ? 0:00 /usr/lib/ssh/sshd
root 60201 13878 0 09:27:36 ? 0:00 /usr/lib/ssh/sshd
root 55272 1 0 11:54:33 ? 0:00 /usr/lib/ssh/sshd
root 5850 13878 0 08:08:47 ? 0:00 /usr/lib/ssh/sshd
root 9865 44290 0 11:56:17 pts/4 0:00 grep ssh
root 13924 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 13878 1 0 Sep 21 ? 0:01 /usr/lib/ssh/sshd

Creating file /var/run/sshd.pid with sshd PID

echo 8047 > /var/run/sshd.pid

root@solaris:/var/run # ls -l sshd.pid
-rw-r–r– 1 root root 5 Sep 10 21:27 sshd.pid

sshdmon does not complain anymore

root@solaris:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check sshdmon
Fri Sep 23 11:58:15 2016 : INFO : UXMONsshdmon is running now, pid=18095
mv: /dev/null and /dev/null are identical
Fri Sep 23 11:58:15 2016 : INFO : UXMONsshdmon end, pid=18095

%d bloggers like this: