Advertisements

Putty: Terminal freezes running cat on Suse Linux

For some reason that I couldn’t run cat on a file and the terminal seems frozen.
terminal_freezes
I solved this problem connecting first to a Red Hat Enterprise Linux 6 server before connecting to the Suse Linux server.

Advertisements

Linux EXT4-fs: error (device dm-156): ext4_lookup: deleted inode referenced: 1091357

Node : serviceguardnode2.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : minor
OM Server Time: 2016-12-22 18:22:32
Message : EXT4-fs: error (device dm-156): ext4_lookup: deleted inode referenced: 1091357
Msg Group : OS
Application : dmsg_mon
Object : EXT4
Event Type :
not_found

Instance Name :
not_found

Instruction : No

Checking which device is complaining. dm-156 is /dev/vgWPJ/lv_orawp0

root@serviceguardnode2:/dev/mapper # ls -l | grep 156
lrwxrwxrwx. 1 root root 9 Dec 14 22:15 vgWPJ-lv_orawp0 -> ../dm-156

The filesystem is currently mounted

root@serviceguardnode2:/dev/mapper # mount | grep lv_orawp0
/dev/mapper/vgWPJ-lv_orawp0 on /oracle/WPJ type ext4 (rw,errors=remount-ro,data_err=abort,barrier=0)

And the logical volume is open

root@serviceguardnode2:~ # lvs vgWPJ
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lv_ora11264 vgWPJ -wi-ao—- 30.00g
lv_orawp0 vgWPJ -wi-ao—- 5.00g

This is a clustered environment and it is currently running on the other node

root@serviceguardnode2:/dev/mapper # cmviewcl | grep -i wpj
dbWPJ up running enabled serviceguardnode1

There is a Red Hat note referencing the error – “ext4_lookup: deleted inode referenced” errors in /var/log/messages in RHEL 6.

In clustered environments, which is the case, if the other node is mounting the filesystem, it will throw these errors in /var/log/messages

root@serviceguardnode2:~ # cmviewcl -v -p dbWPJ

PACKAGE STATUS STATE AUTO_RUN NODE
dbWPJ up running enabled serviceguardnode1

Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual

Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 5 0 dbWPJmon
Subnet up 10.106.10.0

Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled serviceguardnode1 (current)
Alternate up enabled serviceguardnode2

Dependency_Parameters:
DEPENDENCY_NAME NODE_NAME SATISFIED
dbWP0_dep serviceguardnode2 no
dbWP0_dep serviceguardnode1 yes

Other_Attributes:
ATTRIBUTE_NAME ATTRIBUTE_VALUE
Style modular
Priority no_priority

Checking the filesystems. I need to unmount /oracle/WPJ but first I need to umount everything under /oracle/WPJ otherwise it will show that /oracle/WPJ is busy

root@serviceguardnode2:~ # df -hP | grep WPJ
/dev/mapper/vgSAP-lv_WPJ_sys 93M 1.6M 87M 2% /usr/sap/WPJ/SYS
/dev/mapper/vgWPJ-lv_orawp0 4.4G 162M 4.0G 4% /oracle/WPJ
/dev/mapper/vgWPJ-lv_ora11264 27G 4.7G 21G 19% /oracle/WPJ/11204
/dev/mapper/vgWPJlog2-lv_origlogb 2.0G 423M 1.4G 23% /oracle/WPJ/origlogB
/dev/mapper/vgWPJlog2-lv_mirrloga 2.0G 404M 1.5G 22% /oracle/WPJ/mirrlogA
/dev/mapper/vgWPJlog1-lv_origloga 2.0G 423M 1.4G 23% /oracle/WPJ/origlogA
/dev/mapper/vgWPJlog1-lv_mirrlogb 2.0G 404M 1.5G 22% /oracle/WPJ/mirrlogB
/dev/mapper/vgWPJdata-lv_sapdata4 75G 21G 55G 28% /oracle/WPJ/sapdata4
/dev/mapper/vgWPJdata-lv_sapdata3 75G 79M 75G 1% /oracle/WPJ/sapdata3
/dev/mapper/vgWPJdata-lv_sapdata2 75G 7.3G 68G 10% /oracle/WPJ/sapdata2
/dev/mapper/vgWPJdata-lv_sapdata1 75G 1.1G 74G 2% /oracle/WPJ/sapdata1
/dev/mapper/vgWPJoraarch-lv_oraarch 20G 234M 19G 2% /oracle/WPJ/oraarch
scsWPJ:/export/sapmnt/WPJ/profile 4.4G 4.0M 4.1G 1% /sapmnt/WPJ/profile
scsWPJ:/export/sapmnt/WPJ/exe 4.4G 2.5G 1.7G 61% /sapmnt/WPJ/exe

Umounting /oracle/WPJ

root@serviceguardnode2:~ # umount /oracle/WPJ/11204
root@serviceguardnode2:~ # umount /oracle/WPJ/origlogB
root@serviceguardnode2:~ # umount /oracle/WPJ/mirrlogA
root@serviceguardnode2:~ # umount /oracle/WPJ/origlogA
root@serviceguardnode2:~ # umount /oracle/WPJ/mirrlogB
root@serviceguardnode2:~ # umount /oracle/WPJ/sapdata4
root@serviceguardnode2:~ # umount /oracle/WPJ/sapdata3
root@serviceguardnode2:~ # umount /oracle/WPJ/sapdata2
root@serviceguardnode2:~ # umount /oracle/WPJ/sapdata1
root@serviceguardnode2:~ # umount /oracle/WPJ/oraarch
root@serviceguardnode2:~ # umount /oracle/WPJ

Linux LVM: Couldn’t find device with uuid unHhGy-Fg3A-Y8wU-PrWh-hwWx-Ki0R-D6Qasc

On this server, any command that uses LVM returns an error message complaining about a missing disk

root@linux:~ # pvs
Couldn’t find device with uuid unHhGy-Fg3A-Y8wU-PrWh-hwWx-Ki0R-D6Qasc.
WARNING: Inconsistent metadata found for VG oraclevg – updating to use version 24
PV VG Fmt Attr PSize PFree
/dev/mapper/crashvgp1 oraclevg lvm2 a–u 99.98g 99.98g
/dev/mapper/mpathbp1 oraclevg lvm2 a–u 299.96g 299.96g
/dev/mapper/oraclevg_1p1 oraclevg lvm2 a–u 99.98g 0
/dev/mapper/oraclevg_2p1 oraclevg lvm2 a–u 49.98g 0
/dev/sda2 rootvg lvm2 a–u 279.12g 143.62g
unknown device oraclevg lvm2 a-mu 49.98g 49.98g

Volume group oraclevg is showing duplicate

root@linux:~ # vgs -v
Using volume group(s) on command line.
Cache: Duplicate VG name oraclevg: Existing 5Rxet9-eL9E-8hFU-8m98-pVLh-gZMD-e4vZBT (created here) takes precedence over R8fkNM-1vrs-S4DF-reUZ-1pts-zhxk-EHVT1K
Archiving volume group “oraclevg” metadata (seqno 33).
Archiving volume group “oraclevg” metadata (seqno 3).
Creating volume group backup “/etc/lvm/backup/oraclevg” (seqno 3).
Couldn’t find device with uuid unHhGy-Fg3A-Y8wU-PrWh-hwWx-Ki0R-D6Qasc.
Couldn’t find device with uuid unHhGy-Fg3A-Y8wU-PrWh-hwWx-Ki0R-D6Qasc.
Couldn’t find device with uuid unHhGy-Fg3A-Y8wU-PrWh-hwWx-Ki0R-D6Qasc.
Couldn’t find device with uuid unHhGy-Fg3A-Y8wU-PrWh-hwWx-Ki0R-D6Qasc.
WARNING: Inconsistent metadata found for VG oraclevg – updating to use version 34
There are 1 physical volumes missing.
There are 1 physical volumes missing.
Archiving volume group “oraclevg” metadata (seqno 3).
Archiving volume group “oraclevg” metadata (seqno 35).
Creating volume group backup “/etc/lvm/backup/oraclevg” (seqno 35).
VG Attr Ext #PV #LV #SN VSize VFree VG UUID VProfile
oraclevg wz–n- 4.00m 2 2 0 149.96g 0 R8fkNM-1vrs-S4DF-reUZ-1pts-zhxk-EHVT1K
oraclevg wz-pn- 4.00m 3 0 0 449.93g 449.93g 5Rxet9-eL9E-8hFU-8m98-pVLh-gZMD-e4vZBT
rootvg wz–n- 4.00m 1 10 0 279.12g 143.62g 685XSf-7Dsf-76oL-5pp7-t27Z-nT1o-dqXuUB

To view the properties of a specific volume group use –select vg_uuid and inform the UUID gathered from the previous command

root@linux:~ # vgdisplay -v –select vg_uuid=5Rxet9-eL9E-8hFU-8m98-pVLh-gZMD-e4vZBT
Using volume group(s) on command line.
Cache: Duplicate VG name oraclevg: Existing 5Rxet9-eL9E-8hFU-8m98-pVLh-gZMD-e4vZBT (created here) takes precedence over R8fkNM-1vrs-S4DF-reUZ-1pts-zhxk-EHVT1K
Archiving volume group “oraclevg” metadata (seqno 53).
Archiving volume group “oraclevg” metadata (seqno 3).
Creating volume group backup “/etc/lvm/backup/oraclevg” (seqno 3).
Couldn’t find device with uuid unHhGy-Fg3A-Y8wU-PrWh-hwWx-Ki0R-D6Qasc.
There are 1 physical volumes missing.
There are 1 physical volumes missing.
Archiving volume group “oraclevg” metadata (seqno 3).
Archiving volume group “oraclevg” metadata (seqno 53).
Creating volume group backup “/etc/lvm/backup/oraclevg” (seqno 53).
— Volume group —
VG Name oraclevg
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 53
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 0
Open LV 0
Max PV 0
Cur PV 3
Act PV 2
VG Size 449.93 GiB
PE Size 4.00 MiB
Total PE 115181
Alloc PE / Size 0 / 0
Free PE / Size 115181 / 449.93 GiB
VG UUID 5Rxet9-eL9E-8hFU-8m98-pVLh-gZMD-e4vZBT

— Physical volumes —
PV Name /dev/mapper/crashvgp1
PV UUID Q8XgjC-wgao-uABU-6o39-9SVO-DSwE-zFcTSb
PV Status allocatable
Total PE / Free PE 25595 / 25595

PV Name unknown device
PV UUID unHhGy-Fg3A-Y8wU-PrWh-hwWx-Ki0R-D6Qasc
PV Status allocatable
Total PE / Free PE 12795 / 12795

PV Name /dev/mapper/mpathbp1
PV UUID IMYMJx-H5xY-d16M-M63Q-1lHt-4oLN-xtzoeJ
PV Status allocatable
Total PE / Free PE 76791 / 76791

Many LVM command can be run with –select vg_uuid

root@linux:~ # vgchange -a n –select vg_uuid=5Rxet9-eL9E-8hFU-8m98-pVLh-gZMD-e4vZBT
WARNING: Inconsistent metadata found for VG oraclevg – updating to use version 54
Volume group “oraclevg” successfully changed
0 logical volume(s) in volume group “oraclevg” now active

I am removing oraclevg that is missing a physical volume and forcing the removal

root@linux:~ # vgremove –select vg_uuid=5Rxet9-eL9E-8hFU-8m98-pVLh-gZMD-e4vZBT -f
Volume group “oraclevg” successfully removed

Running vgs -v doesn’t show duplicate anymore

root@linux:~ # vgs -v
Using volume group(s) on command line.
Archiving volume group “oraclevg” metadata (seqno 3).
Creating volume group backup “/etc/lvm/backup/oraclevg” (seqno 3).
VG Attr Ext #PV #LV #SN VSize VFree VG UUID VProfile
oraclevg wz–n- 4.00m 2 2 0 149.96g 0 R8fkNM-1vrs-S4DF-reUZ-1pts-zhxk-EHVT1K
rootvg wz–n- 4.00m 1 10 0 279.12g 143.62g 685XSf-7Dsf-76oL-5pp7-t27Z-nT1o-dqXuUB

HPOM (xpl-394) semget operation failed.

[11/26/16 00:28:51] [HPOvXpl] [configure] [INFO] Creating directories under OvDataDir
[11/26/16 00:28:51] [HPOvXpl] [configure] [INFO] Directory creation under OvDataDir completed.
[11/26/16 00:28:51] [HPOvXpl] [configure] [INFO] Creating configuration files.
[11/26/16 00:28:51] [HPOvXpl] [configure] [INFO] Updating XPL configuration.
(xpl-394) semget operation failed.
(RTL-28) No space left on device
[11/26/16 00:28:51] [HPOvXpl] [configure] [ERROR] Unable to update XPL configuration.
[11/26/16 00:28:51] [HPOvXpl] [configure] [INFO] Updating component matrix.
[11/26/16 00:28:51] [HPOvXpl] [configure] [INFO] Changing the group ownership of the files if the group is defined in XPL.
[11/26/16 00:28:52] [HPOvXpl] [configure] [INFO] Xpl configuration DONE

root@linux:~ # sysctl kernel.sem
kernel.sem = 250 32000 100 128

root@linux:~ # sysctl -w kernel.sem=”250 32000 100 256″
kernel.sem = 250 32000 100 256

root@linux:~ # ipcs -l

—— Semaphore Limits ——–
max number of arrays = 256
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 100
semaphore max value = 32767

root@linux:~ # ipcs -u

—— Semaphore Status ——–
used arrays = 128
allocated semaphores = 3500

hpacucli still running! Stop it first.

If you have a Proliant Generation 9

root@linux:~ # dmidecode | grep -i proliant
Product Name: ProLiant BL660c Gen9
Family: ProLiant
HP ProLiant System/Rack Locator

Use hpssacli to check the disk array. Remove hpacucli

root@linux:~ # rpm -e hpacucli
hpacucli still running! Stop it first.
error: %preun(hpacucli-9.40-12.0.x86_64) scriptlet failed, exit status 1

root@linux:~ # ps -ef | grep hpacucli
root 38972 1 0 Jun09 ? 00:00:00 /opt/compaq/hpacucli/bld/.hpacucli ctrl all show config
root 89907 4947 0 10:03 pts/7 00:00:00 grep hpacucli

root@linux:~ # kill -9 38972

Adding and removing network interfaces from a network bonding

Removing eth3 from bond0

root@linux:~ # ifenslave -d bond0 eth3

root@linux:~ # cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.4.0-2 (October 7, 2008)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth1
MII Status: up
Speed: 2500 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d8:d3:85:e7:d1:70

Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d8:d3:85:e7:51:8a

Adding eth3 to bond1

root@linux:~ # ifenslave bond1 eth3

root@linux:~ # cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.4.0-2 (October 7, 2008)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Speed: 2500 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d8:d3:85:e7:d1:6c

Slave Interface: eth3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d8:d3:85:e7:51:8b

Problems starting HP System Management Homepage

root@linux:~ # service hpsmhd restart

Stopping hpsmhd: [FAILED]
Starting hpsmhd: httpd (pid 6681) already running [ OK ]

The process is a nfs daemon

root@linux:~ # ps -ef | grep 6681 | grep -v grep
root 6681 2 0 11:18 ? 00:00:00 [nfsd]

Still researching about this error

UXMONhwmon: severity=major Physical Drive Failed

I have a HP Proliant DL380 G7 where I know that there is a physical disk with problems but HPOM is not alarming.

Checking HPOM hwmon module

root@linux:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check hwmon
Mon Oct 3 09:37:34 2016 : INFO : HPASM not installed, exit

To have it enabled and working, you need hpasmcli installed. It is located on /sbin/hpasmcli

Red Hat Enterprise Linux 5

root@rhel5:~ # which hpasmcli
/sbin/hpasmcli

root@rhel5:~ # rpm -qf /sbin/hpasmcli
hp-health-9.30-1564.32.rhel5

root@rhel5:~ # cat /etc/*release
Red Hat Enterprise Linux Server release 5.11 (Tikanga)

RedHat Enterprise Linux 6

root@rhel6:~ # which hpasmcli
/sbin/hpasmcli

root@rhel6:~ # rpm -qf /sbin/hpasmcli
hp-health-10.20-1723.28.rhel6.x86_64

root@rhel6:~ # cat /etc/*release
LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Red Hat Enterprise Linux Server release 6.7 (Santiago)
Red Hat Enterprise Linux Server release 6.7 (Santiago)

HP System Health Application and Insight Management Agents for Red Hat Enterprise Linux 5 (AMD64/EM64T)
HPE System Health Application and Command Line Utilities for Red Hat Enterprise Linux 6 (AMD64/EM64T)

After installing hp-health package, HPOM hwmon module will work properly

root@linux:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check hwmon
Mon Oct 3 11:06:34 2016 : INFO : UXMONhwmon is running now, pid=20711
Mon Oct 3 11:06:43 2016 : UXMONhwmon: severity=major Physical Drive Failed
mv: `/dev/null’ and `/dev/null’ are the same file
Mon Oct 3 11:06:43 2016 : INFO : UXMONhwmon end, pid=20711

Arrange a hardware replacement for the failed disk

UXMON: SSHD Daemon is not running or not doing it properly, please check

Node : solaris.setaoffice.com
Node Type : Sun SPARC (HTTPS)
Severity : normal
OM Server Time: 2016-09-10 08:03:10
Message : UXMON: SSHD Daemon is not running or not doing it properly, please check
Msg Group : OS
Application : sshd_mon
Object : sshd
Event Type :
not_found

Instance Name :
not_found

Instruction : It has been detected an SSH installation but the SSHD is not running
Please check SSH status, because it might happen also there are still some ssh spawned processes running but the father has died.

Note that if the SSH is not available this might prevent users log in the server and even impact some applications.

HPOM is complaining that ssh is not running but obviously is running because you’re connected to the server using ssh

root@solaris:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check sshdmon
Fri Sep 23 11:45:46 2016 : INFO : UXMONsshdmon is running now, pid=2250
Fri Sep 23 11:45:46 2016 : SSHDMON: SSHD – Not running
mv: /dev/null and /dev/null are identical
Fri Sep 23 11:45:46 2016 : INFO : UXMONsshdmon end, pid=2250

Check directory /var/run

root@solaris:/var/run # ls -la
total 16
drwxr-xr-x 4 root other 5 Sep 23 11:50 .
drwxr-xr-x 44 root sys 50 Aug 16 11:04 ..
-rw——- 1 root root 6 Jul 7 11:24 ds_agent.pid
drwxr-xr-x 13 root root 13 Aug 10 15:33 install_engine
drwx–x–x 2 root sys 2 Jul 6 14:27 sudo

It should have many files in /var/run

root@solaris:/var/run # ls -l
total 272
-rw——- 1 root root 0 Sep 10 21:27 AdDrEm.lck
drwxr-xr-x 3 root sys 183 Sep 10 21:43 cacao
-rw-rw-rw- 1 root bin 14 Sep 23 09:20 cdrom_rcm.conf
drwxr-xr-x 2 daemon daemon 183 Sep 23 12:18 daemon
-rw-r—– 1 root root 6 Sep 23 10:41 did_reloader.lock
-rw——- 1 root root 5 Sep 10 21:27 ds_agent.pid
Drw-r—– 1 root root 0 Sep 10 21:28 event_listener_proxy_door
Drw-r–r– 1 root root 0 Sep 10 21:40 fed_doorglobal
Drw-r–r– 1 root root 0 Sep 10 21:27 hotplugd_door
Drw-r–r– 1 root root 0 Sep 10 21:28 ifconfig_proxy_doorglobal
-rw——- 1 root root 0 Sep 10 21:26 ipsecconf.lock
Dr–r–r– 1 daemon daemon 0 Sep 10 21:26 kcfd_door
-rw——- 1 root root 0 Sep 14 09:07 lockf_raidctl
Dr–r–r– 1 root root 0 Sep 10 21:26 name_service_door
-rw-r–r– 1 root root 8 Sep 10 21:40 nfs4_domain
drwxr-xr-x 2 root root 179 Sep 10 21:40 pcmcia
Dr–r–r– 1 root root 0 Sep 10 21:26 picld_door
Drw-r–r– 1 root root 0 Sep 10 21:30 pmfd_doorglobal
-rw-r–r– 1 root sys 58 Sep 10 21:30 psn
Dr——– 1 root root 0 Sep 10 21:26 rcm_daemon_door
-rw-r–r– 1 root root 0 Sep 10 21:26 rcm_daemon_lock
-rw——- 1 root root 1068 Sep 10 21:26 rcm_daemon_state
Drw-r–r– 1 root root 0 Sep 10 21:40 rgmd_receptionist_doorglobal
drwxrwxrwt 2 root root 186 Sep 10 21:27 rpc_door
drwx—— 2 root root 182 Sep 10 21:27 smc898
-rw-r–r– 1 root root 5 Sep 10 21:27 sshd.pid
drwx–x–x 3 root sys 176 Sep 10 21:31 sudo
drwxr-xr-x 3 root root 191 Sep 10 21:26 sysevent_channels
Drw-r–r– 1 root root 0 Sep 10 21:30 sysevent_proxy_doorglobal
-rw-r–r– 1 root root 5 Sep 10 21:27 syslog.pid
Drw-r–r– 1 root root 0 Sep 10 21:27 syslog_door
-rw-r–r– 1 root root 8192 Sep 10 21:26 tzsync
drwx—— 2 root root 2625 Sep 23 10:26 zones
Drw-r–r– 1 root root 0 Sep 10 21:30 zoneup_doorglobal

Fixing the issue for the ticket. Check ssh processes for root user

root@solaris:/var/run # ps -ef | grep ssh | grep root
root 8047 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 17380 13924 0 00:07:02 ? 0:00 /usr/lib/ssh/sshd
root 5570 13878 0 08:08:40 ? 0:00 /usr/lib/ssh/sshd
root 13877 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 1003 13878 0 09:17:01 ? 0:00 /usr/lib/ssh/sshd
root 13903 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 60966 13918 0 00:03:07 ? 0:00 /usr/lib/ssh/sshd
root 48654 13878 0 10:13:22 ? 0:00 /usr/lib/ssh/sshd
root 13918 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 17389 13924 0 00:07:02 ? 0:00 /usr/lib/ssh/sshd
root 39554 13878 0 09:21:51 ? 0:00 /usr/lib/ssh/sshd
root 64681 1 0 11:25:02 ? 0:00 /usr/lib/ssh/sshd
root 11912 13878 0 09:29:14 ? 0:00 /usr/lib/ssh/sshd
root 56172 13878 0 11:54:55 ? 0:00 /usr/lib/ssh/sshd
root 17386 13924 0 00:07:02 ? 0:00 /usr/lib/ssh/sshd
root 34708 13878 0 08:51:07 ? 0:00 /usr/lib/ssh/sshd
root 60201 13878 0 09:27:36 ? 0:00 /usr/lib/ssh/sshd
root 55272 1 0 11:54:33 ? 0:00 /usr/lib/ssh/sshd
root 5850 13878 0 08:08:47 ? 0:00 /usr/lib/ssh/sshd
root 9865 44290 0 11:56:17 pts/4 0:00 grep ssh
root 13924 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 13878 1 0 Sep 21 ? 0:01 /usr/lib/ssh/sshd

Creating file /var/run/sshd.pid with sshd PID

echo 8047 > /var/run/sshd.pid

root@solaris:/var/run # ls -l sshd.pid
-rw-r–r– 1 root root 5 Sep 10 21:27 sshd.pid

sshdmon does not complain anymore

root@solaris:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check sshdmon
Fri Sep 23 11:58:15 2016 : INFO : UXMONsshdmon is running now, pid=18095
mv: /dev/null and /dev/null are identical
Fri Sep 23 11:58:15 2016 : INFO : UXMONsshdmon end, pid=18095

UXMON:bond1.1504 is down – Network Bonding Interface is alarming that is down but it is active in the system

Node : linux.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : critical
OM Server Time: 2016-09-27 18:35:55
Message : UXMON:bond1.1504 is down
Msg Group : OS
Application : bondmon
Object : bond
Event Type :
not_found

Instance Name :
not_found

Instruction : The ‘cat /sys/class/net/$bond/bonding/mii_status’ command shows the detail status

Please check /var/opt/OV/log/OpC/bond_mon.log for more details

The module bondmon is complaining about a network bonding interface down

root@linux:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check bondmon
Wed Sep 28 08:15:43 2016 : INFO : UXMONbondmon is running now, pid=31311
Wed Sep 28 08:15:43 2016 : Critical: bond1.1504 is down
mv: `/dev/null’ and `/dev/null’ are the same file
Wed Sep 28 08:15:43 2016 : INFO : UXMONbondmon end, pid=31311

It was showing bond1.1504 in the network bonding interfaces and no interfaces available.

root@linux:~ # ls -l /proc/net/bonding
total 0
-r–r–r– 1 root root 0 Sep 28 08:48 bond0
-r–r–r– 1 root root 0 Sep 28 08:48 bond1
-r–r–r– 1 root root 0 Sep 28 08:48 bond1.1504
-r–r–r– 1 root root 0 Sep 28 08:48 bond2

Removed bond1.1504

root@linux:~ # echo “-bond1.1504” > /sys/class/net/bonding_masters

root@linux:~ # ls -l /proc/net/bonding
total 0
-r–r–r– 1 root root 0 Sep 28 08:59 bond0
-r–r–r– 1 root root 0 Sep 28 08:59 bond1
-r–r–r– 1 root root 0 Sep 28 08:59 bond2

The configuration file for bond1.1504 was missing the parameter VLAN=yes. So added the paramter

root@linux:~ # cat /etc/sysconfig/network-scripts/ifcfg-bond1.1504
DEVICE=bond1.1504
BOOTPROT=none
ONBOOT=yes
IPADDRES=10.32.28.175
NETMASK=255.255.255.0
BONDING_OPTS=”miimon=1000 mode=active-backup”

root@linux:~ # cat /etc/sysconfig/network-scripts/ifcfg-bond0.1504
DEVICE=bond0.1504
BOOTPROT=none
ONBOOT=yes
IPADDR=10.32.17.87
NETMASK=255.255.254.0
BONDING_OPTS=”miimon=1000 mode=active-backup”
VLAN=yes

Bring the network interface up

root@linux:~ # ifup ifcfg-bond1.1504

And configured the IP and netmask shown on the configuration file

root@linux:~ # ifconfig bond1.1504 10.32.28.175 netmask 255.255.255.0

root@linux:~ # ifconfig bond1.1504
bond1.1504 Link encap:Ethernet HWaddr 6C:C2:17:30:88:88
inet addr:10.32.28.175 Bcast:10.32.28.255 Mask:255.255.255.0
inet6 addr: fe80::6ec2:17ff:fe30:8888/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1034 errors:0 dropped:0 overruns:0 frame:0
TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:78494 (76.6 KiB) TX bytes:468 (468.0 b)

Running UXMONbroker with the module bondmon

root@linux:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check bondmon
Wed Sep 28 09:15:19 2016 : INFO : UXMONbondmon is running now, pid=25212
mv: `/dev/null’ and `/dev/null’ are the same file
Wed Sep 28 09:15:19 2016 : INFO : UXMONbondmon end, pid=25212

%d bloggers like this: