Advertisements

UXMON: slave eno49 of bonding device bond0 in Red Hat Enterprise Linux 7

ATTENTION, RMC LEVEL 1 AGENT: This ticket will be automatically worked by the Automation Bus. Pls. ensure your Ticket List/View includes the “Assignee” column, monitor this ticket until the user “ABOPERATOR” is no longer assigned, BEFORE you start work on this ticket.
Node : linux.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : major
OM Server Time: 2017-03-07 09:50:03
Message : UXMON: slave eno49 of bonding device bond0
Msg Group : OS
Application : bondmon
Object : bond
Event Type :
not_found

Instance Name :
not_found

Instruction : The ‘cat /sys/class/net/bondX/slave_ethY/operstate’ command shows detail

Please check /var/opt/OV/log/OpC/bond_mon.log for more details

Running UXMONbroker we see that there are some files that don’t exist

root@linux:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check bondmon
Mon Mar 13 15:47:35 2017 : INFO : UXMONbondmon is running now, pid=17458
cat: /sys/class/net/bond0/slave_eno49/operstate: No such file or directory
Mon Mar 13 15:47:35 2017 : Major: slave eno49 of bonding device bond0
cat: /sys/class/net/bond0/slave_eno50/operstate: No such file or directory
Mon Mar 13 15:47:35 2017 : Major: slave eno50 of bonding device bond0
mv: ‘/dev/null’ and ‘/dev/null’ are the same file
Mon Mar 13 15:47:35 2017 : INFO : UXMONbondmon end, pid=17458

root@linux:~ # ls -l /sys/class/net/bond0/
total 0
-r–r–r– 1 root root 4096 Mar 2 13:35 addr_assign_type
-r–r–r– 1 root root 4096 Mar 7 17:38 address
-r–r–r– 1 root root 4096 Mar 7 17:38 addr_len
drwxr-xr-x 2 root root 0 Mar 2 13:35 bonding
-r–r–r– 1 root root 4096 Mar 7 17:38 broadcast
-rw-r–r– 1 root root 4096 Mar 7 17:38 carrier
-r–r–r– 1 root root 4096 Mar 7 17:38 carrier_changes
-r–r–r– 1 root root 4096 Mar 2 13:35 dev_id
-r–r–r– 1 root root 4096 Mar 7 17:38 dev_port
-r–r–r– 1 root root 4096 Mar 7 17:38 dormant
-r–r–r– 1 root root 4096 Mar 7 17:38 duplex
-rw-r–r– 1 root root 4096 Mar 7 17:38 flags
-rw-r–r– 1 root root 4096 Mar 7 17:38 gro_flush_timeout
-rw-r–r– 1 root root 4096 Mar 7 17:38 ifalias
-r–r–r– 1 root root 4096 Mar 2 13:35 ifindex
-r–r–r– 1 root root 4096 Mar 2 13:35 iflink
-r–r–r– 1 root root 4096 Mar 7 17:38 link_mode
lrwxrwxrwx 1 root root 0 Mar 7 17:38 lower_eno49 -> ../../../pci0000:00/0000:00:02.0/0000:06:00.0/net/eno49
lrwxrwxrwx 1 root root 0 Mar 7 17:38 lower_eno50 -> ../../../pci0000:00/0000:00:02.0/0000:06:00.1/net/eno50
-rw-r–r– 1 root root 4096 Mar 7 17:38 mtu
-rw-r–r– 1 root root 4096 Mar 7 17:38 netdev_group
-r–r–r– 1 root root 4096 Mar 7 17:38 operstate
-r–r–r– 1 root root 4096 Mar 2 13:35 phys_port_id
drwxr-xr-x 2 root root 0 Mar 2 13:41 power
drwxr-xr-x 34 root root 0 Mar 2 13:35 queues
-r–r–r– 1 root root 4096 Mar 7 17:38 speed
drwxr-xr-x 2 root root 0 Mar 2 13:41 statistics
lrwxrwxrwx 1 root root 0 Mar 2 13:35 subsystem -> ../../../../class/net
-rw-r–r– 1 root root 4096 Mar 7 17:38 tx_queue_len
-r–r–r– 1 root root 4096 Mar 2 13:35 type
-rw-r–r– 1 root root 4096 Mar 2 13:35 uevent

This is a Red Hat Enterprise Linux 7. Current HPOM have a bug that is querying the wrong file

root@linux:~ # cat /etc/*release
NAME=”Red Hat Enterprise Linux Server”
VERSION=”7.3 (Maipo)”
ID=”rhel”
ID_LIKE=”fedora”
VERSION_ID=”7.3″
PRETTY_NAME=”Red Hat Enterprise Linux”
ANSI_COLOR=”0;31″
CPE_NAME=”cpe:/o:redhat:enterprise_linux:7.3:GA:server”
HOME_URL=”https://www.redhat.com/”
BUG_REPORT_URL=”https://bugzilla.redhat.com/”

REDHAT_BUGZILLA_PRODUCT=”Red Hat Enterprise Linux 7″
REDHAT_BUGZILLA_PRODUCT_VERSION=7.3
REDHAT_SUPPORT_PRODUCT=”Red Hat Enterprise Linux”
REDHAT_SUPPORT_PRODUCT_VERSION=”7.3″
Red Hat Enterprise Linux Server release 7.3 (Maipo)
Red Hat Enterprise Linux Server release 7.3 (Maipo)

Copy the bond_mon.cfg configuration file and disable this module

cp -p /var/opt/OV/bin/instrumentation/bond_mon.cfg /var/opt/OV/conf/OpC
vi /var/opt/OV/conf/OpC/bond_mon.cfg
disable = yes

root@linux:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check bondmon
root@linux:~ #

Advertisements

Solaris 9 Branded Zone was not starting ftp when running kill -HUP

I have a Solaris 9 Branded Zone

root@solaris9:/ # uname -a
SunOS solaris9 5.9 Generic_Virtual sun4v sparc sun4v

Configured to run FTP

root@solaris9:/ # grep ftp /etc/inetd.conf
# ftp telnet shell login exec tftp finger printer
# TFTPD – tftp server (primarily used for booting)
#tftp dgram udp6 wait root /usr/sbin/in.tftpd in.tftpd -s /tftpboot
ftp stream tcp6 nowait root /usr/sbin/in.ftpd in.ftpd -l

But it was not working

root@solaris9:/ # ps -ef | grep ftp
root 10137 13230 0 13:31:28 pts/4 0:00 grep ftp

root@solaris9:/ # ps -ef | grep inet
root 12579 13230 0 13:31:34 pts/4 0:00 grep inet
root 1325 12833 0 Mar 12 ? 0:00 /usr/sbin/inetd -s start

Tried to kill -HUP but still not working

root@solaris9:/ # kill -HUP 1325

root@solaris9:/ # netstat -an | grep 21 | grep LISTEN
142.40.236.158.1521 *.* 0 0 1048576 0 LISTEN
142.40.236.10.1521 *.* 0 0 1048576 0 LISTEN

Stopped and started inetsvc

root@solaris9:/ # /etc/init.d/inetsvc stop
root@solaris9:/ # /etc/init.d/inetsvc start

root@solaris9:/ # ps -ef | grep inet
root 12098 12833 0 13:49:02 ? 0:00 /usr/sbin/inetd -s
root 15358 3734 0 13:49:05 pts/4 0:00 grep inet

FTP working again

root@solaris9:/ # netstat -an | grep 21 | grep LISTEN
142.40.236.158.1521 *.* 0 0 1048576 0 LISTEN
142.40.236.10.1521 *.* 0 0 1048576 0 LISTEN
*.21 *.* 0 0 1048576 0 LISTEN
*.21 *.* 0 0 1048576 0 LISTEN

One path missing in disk map on multipath device

Showing a particular case:

The disk mpath5 was only showing one path

root@linux:~ # multipath -ll mpath5
mpath5 (350002ac19430374a) dm-17 3PARdata,VV
[size=47G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
\_ 2:0:0:1 sdh 8:112 [active][ready]

The disk used by operating system is cciss/c0d0

root@linux:~ # pvs
PV VG Fmt Attr PSize PFree
/dev/cciss/c0d0p3 vg00 lvm2 a– 269.47G 203.28G
/dev/mpath/350002ac19429374a vgapp lvm2 a– 100.00G 0
/dev/mpath/350002ac1942c374a vgapp lvm2 a– 20.00G 0
/dev/mpath/350002ac1942e374a vgapp lvm2 a– 75.00G 0
/dev/mpath/350002ac1942f374a vgapp lvm2 a– 158.00G 0
/dev/mpath/350002ac19430374a vgapp lvm2 a– 47.00G 996.00M
/dev/mpath/350002ac22869374a vgapp lvm2 a– 100.00G 0
/dev/mpath/350002ac2286a374a vgapp lvm2 a– 40.00G 0

Listing the SCSI devices. sda through sdn are used

root@linux:~ # lsscsi
[1:0:0:1] disk 3PARdata VV 3213 /dev/sda
[1:0:0:2] disk 3PARdata VV 3213 /dev/sdb
[1:0:0:3] disk 3PARdata VV 3213 /dev/sdc
[1:0:0:4] disk 3PARdata VV 3213 /dev/sdd
[1:0:0:5] disk 3PARdata VV 3213 /dev/sde
[1:0:0:6] disk 3PARdata VV 3213 /dev/sdf
[1:0:0:7] disk 3PARdata VV 3213 /dev/sdg
[1:0:0:254] enclosu 3PARdata SES 3213 –
[2:0:0:1] disk 3PARdata VV 3213 /dev/sdh
[2:0:0:2] disk 3PARdata VV 3213 /dev/sdi
[2:0:0:3] disk 3PARdata VV 3213 /dev/sdj
[2:0:0:4] disk 3PARdata VV 3213 /dev/sdk
[2:0:0:5] disk 3PARdata VV 3213 /dev/sdl
[2:0:0:6] disk 3PARdata VV 3213 /dev/sdm
[2:0:0:7] disk 3PARdata VV 3213 /dev/sdn
[2:0:0:254] enclosu 3PARdata SES 3213 –

Checking /etc/multipath.conf. sda was being blacklisted. Commented the line

root@linux:~ # grep -v ^# /etc/multipath.conf

blacklist {
devnode “^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*”
devnode “^hd[a-z][[0-9]*]”
devnode “^hd[a-z]”
#devnode “^sda$”
}

defaults {
user_friendly_names yes
}

Running multipath -v3

root@linux:~ # multipath -v3

Checking disk mpath5

root@linux:~ # multipath -ll mpath5
mpath5 (350002ac19430374a) dm-17 3PARdata,VV
[size=47G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
\_ 1:0:0:1 sda 8:0 [active][ready]
\_ 2:0:0:1 sdh 8:112 [active][ready]

Remove ^M (Control-M) characters from a file in Unix

When checking my /etc/multipath.conf file, there was a lot of hidden characters ^M at the end of lines

root@linux:~ # cat -vet /etc/multipath.conf

To remove them, use tr and redirect the output to a new file.

root@linux:~ # tr -d ‘\015’ /tmp/multipath.conf

Check the file and if everything is okay, replace the original file

LVM: Please specify number of stripes (-i) and stripesize (-I)

I tried to resize a logical volume but showed a message to specify stripes and stripesize

root@linux:~ # lvresize -L +200g -r /dev/vgER0data/lv_sapdata1
Please specify number of stripes (-i) and stripesize (-I)
Run `lvresize –help’ for more information.

Showing stripes and stripesize

root@linux:~ # lvdisplay /dev/vgER0data/lv_sapdata1 -m
— Logical volume —
LV Path /dev/vgER0data/lv_sapdata1
LV Name lv_sapdata1
VG Name vgER0data
LV UUID h2WKAp-h16f-O3Bo-QpEz-fTiW-s3Hr-K7W9sb
LV Write Access read/write
LV Creation host, time cavadb77, 2015-09-24 11:31:33 -0300
LV Status available
# open 1
LV Size 2.99 TiB
Current LE 783355
Segments 2
Allocation inherit
Read ahead sectors auto
– currently set to 32768
Block device 253:423

— Segments —
Logical extents 0 to 655359:
Type striped
Stripes 2
Stripe size 4.00 MiB
Stripe 0:
Physical volume /dev/mapper/ER0_data_disk_001p1
Physical extents 0 to 327679
Stripe 1:
Physical volume /dev/mapper/ER0_data_disk_002p1
Physical extents 0 to 327679

Logical extents 655360 to 783354:
Type linear
Physical volume /dev/mapper/ER0_data_disk_003
Physical extents 0 to 127994

To solve the problem in this case it was necessary to add a new LUN and resize the logical volume. The segment where it is linear, was kept this way. There is no way to convert

root@linux:~ # lvresize -L +100g -r -i2 -I4M /dev/mapper/vgER0data-lv_sapdata1
Size of logical volume vgER0data/lv_sapdata1 changed from 2.99 TiB (783355 extents) to 3.09 TiB (808955 extents).
Logical volume lv_sapdata1 successfully resized
resize2fs 1.41.12 (17-May-2010)
Filesystem at /dev/mapper/vgER0data-lv_sapdata1 is mounted on /oracle/ER0/sapdata1; on-line resizing required
old desc_blocks = 192, new_desc_blocks = 198
Performing an on-line resize of /dev/mapper/vgER0data-lv_sapdata1 to 828369920 (4k) blocks.
The filesystem on /dev/mapper/vgER0data-lv_sapdata1 is now 828369920 blocks long.

VMware guest server taking a long time to finish boot stuck at Bringing up interface

Rebooted a server and saw some messages about VM communication interface showing errors

Starting VMware Tools services in the virtual machine
Switching to guest configuration: OK
Paravirtual SCSI module: OK
Guest memory manager: OK
VM communication interface: FAILED
VM communication interface socket family: FAILED
File system sync driver: OK
Guest operating system daemon: OK
FAILED

Long time at the message Bringing up interface eth0
vmware_bringing
Updated VMware Tools VMware Tools does not start after updating the kernel

Had to changed the driver for network interface for the server to finish the boot sequence

UXMON: AIX syslog alarm: TAPE DRIVE FAILURE with ID 5537AC5F

Node : aix.setaoffice.com
Node Type : IBM RS/6000 64 HTTPS
Severity : warning
OM Server Time: 2017-01-21 12:03:23
Message : UXMON: AIX syslog alarm: TAPE DRIVE FAILURE with ID 5537AC5F
Msg Group : OS
Application : syslog
Object : 5537AC5F
Event Type :
not_found

Instance Name :
not_found

Instruction : Has been detected an alarm in the AIX errpt module
The annotation of this case will show the template description of this ID
If you feel this event is useless you can filter out it using the uxmonsyslog.cfg, see this
same file and/or documentation for details

Checking device status

root@aix:/ # lsdev -Cc tape | grep rmt109
rmt109 Available 0I-00-02 IBM 3580 Ultrium Tape Drive (FCP)

Checking device information

root@aix:/ # lscfg -vpl rmt109
rmt109 U789D.001.DQD16LD-P1-C1-T1-W50050763004BE309-L0 IBM 3580 Ultrium Tape Drive (FCP)

Manufacturer…………….IBM
Machine Type and Model……ULT3580-TD5
Serial Number……………00078AE800
Device Specific.(FW)……..F990

PLATFORM SPECIFIC

Name: tape
Node: tape
Device Type: byte

In our environment rmt109 is part of a tape library. BUR team needs to check tape W50050763004BE309

Unable to connect to the MKS: Could not connect to pipe \\.\pipe\vmware-authdpipe within retry period

I was receiving the following error when connecting to the system console through the VMware vSphere client

Unable to connect to the MKS: Could not connect to pipe \\.\pipe\vmware-authdpipe within retry period

vmwaremks

I found a page knowledge base article Opening a virtual machine console from vSphere Client fails with the error: Unable to contact the MKS (2032016)

It shows that you need to delete a variable called USER and the steps to do so.
I deleted a variable called USERNAME

systemenvironment

Trying to become another user on Linux/getting error logging to server – su: cannot ser user id: Resource temporarily unavailable

emerson@linux:~ $ sudo su – oracle
sudo password for emerson:
su: cannot ser user id: Resource temporarily unavailable
emerson@linux:~ $

The number of processes for the user must have exceeded the soft limit. Change it to a higher number

root@linux:~ # vi /etc/security/limits.conf
oracle soft nproc 2047
oracle hard nproc 16384

Source: “cannot set user id: Resource temporarily unavailable” while trying to login or su as a local user in Red Hat Enterprise Linux

SAP HANA – UXMON: file systems were remounted read-only

Node : node99.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : critical
OM Server Time: 2016-12-14 10:28:03
Message : UXMON: file systems were remounted read-only
Msg Group : Hardware
Application : syslog
Object : hardware
Event Type :
not_found

Instance Name :
not_found

Instruction : No

This is a SAP Hana server and there are two filesystems that are giving an error saying they are mounted read-only.

node99:~ # df -hP /hana/data/LP0/mnt00002 /hana/log/LP0/mnt00002
df: `/hana/data/LP0/mnt00002′: Input/output error
df: `/hana/log/LP0/mnt00002′: Input/output error
df: no file systems processed

node99:~ # mount | grep mnt00002
/dev/mapper/vg_data1_12-lvol1 on /hana/data/LP0/mnt00002 type xfs (rw,relatime,swalloc,attr2,delaylog,nobarrier,inode64,logbufs=8,logbsize=256k,sunit=32,swidth=32768,noquota)
/dev/mapper/vg_log1_12-lvol1 on /hana/log/LP0/mnt00002 type xfs (rw,relatime,swalloc,attr2,delaylog,nobarrier,inode64,logbufs=8,logbsize=256k,sunit=32,swidth=32768,noquota)

According to SAP team, these two filesystems are mounted read-only until a node from SAP Hana fails and then it is mounted read-write. In this case it is better to disable monitoring.

root@linux:~ # cp -p /var/opt/OV/bin/instrumentation/hw_mon.cfg /var/opt/OV/conf/OpC

Editing file and adding the string “file systems were remounted read-only”

root@linux:~ # vi /var/opt/OV/conf/OpC/hw_mon.cfg
#############################################################################
#@ $Id: hw_mon.cfg 2149 2015-03-03 08:45:34Z zhaofeif $
#@ $Rev: 2149 $
#@ $Author: zhaofeif $
#@ $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@ $LastChangedBy: zhaofeif $
##############################################################################
#[REARM = TRUE|FALSE]
#[disable = yes|no]
#[interval = ]
#[ignore string]
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# The module will allow to run after the interval minutes every time

# ignore string
#===============
# All the output got from command match the ignore string will not record to hw_mon.log that means this kind of hardware error will be ignored

# The below lines are default predefined strings for selection as ignore string which user can uncomment out if he/she need ignore the kind of error
# Please don’t modify the below predefined strings which just need the operation of comment or uncomment for you.

#REARM = true

#Power Supply Error
#FAN Error
#Thermal Sensor Error
#Memory Failed
#CPU Failed
#Physical Drive Failed
#Drive Array Accelerator Battery Failed
file systems were remounted read-only

%d bloggers like this: