Month: December 2016

SAP HANA – UXMON: file systems were remounted read-only

Node : node99.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : critical
OM Server Time: 2016-12-14 10:28:03
Message : UXMON: file systems were remounted read-only
Msg Group : Hardware
Application : syslog
Object : hardware
Event Type :
not_found

Instance Name :
not_found

Instruction : No

This is a SAP Hana server and there are two filesystems that are giving an error saying they are mounted read-only.

node99:~ # df -hP /hana/data/LP0/mnt00002 /hana/log/LP0/mnt00002
df: `/hana/data/LP0/mnt00002′: Input/output error
df: `/hana/log/LP0/mnt00002′: Input/output error
df: no file systems processed

node99:~ # mount | grep mnt00002
/dev/mapper/vg_data1_12-lvol1 on /hana/data/LP0/mnt00002 type xfs (rw,relatime,swalloc,attr2,delaylog,nobarrier,inode64,logbufs=8,logbsize=256k,sunit=32,swidth=32768,noquota)
/dev/mapper/vg_log1_12-lvol1 on /hana/log/LP0/mnt00002 type xfs (rw,relatime,swalloc,attr2,delaylog,nobarrier,inode64,logbufs=8,logbsize=256k,sunit=32,swidth=32768,noquota)

According to SAP team, these two filesystems are mounted read-only until a node from SAP Hana fails and then it is mounted read-write. In this case it is better to disable monitoring.

root@linux:~ # cp -p /var/opt/OV/bin/instrumentation/hw_mon.cfg /var/opt/OV/conf/OpC

Editing file and adding the string “file systems were remounted read-only”

root@linux:~ # vi /var/opt/OV/conf/OpC/hw_mon.cfg
#############################################################################
#@ $Id: hw_mon.cfg 2149 2015-03-03 08:45:34Z zhaofeif $
#@ $Rev: 2149 $
#@ $Author: zhaofeif $
#@ $Date: 2015-03-03 16:45:34 +0800 (Tue, 03 Mar 2015) $
#@ $LastChangedBy: zhaofeif $
##############################################################################
#[REARM = TRUE|FALSE]
#[disable = yes|no]
#[interval = ]
#[ignore string]
#
# rearm
#===============
# If set TRUE (or true), rearm function is enabled, default is disabled
# disable
#===============
# If set disable to YES (or yes), this module won’t run anytime
#
# interval
#===============
# The module will allow to run after the interval minutes every time

# ignore string
#===============
# All the output got from command match the ignore string will not record to hw_mon.log that means this kind of hardware error will be ignored

# The below lines are default predefined strings for selection as ignore string which user can uncomment out if he/she need ignore the kind of error
# Please don’t modify the below predefined strings which just need the operation of comment or uncomment for you.

#REARM = true

#Power Supply Error
#FAN Error
#Thermal Sensor Error
#Memory Failed
#CPU Failed
#Physical Drive Failed
#Drive Array Accelerator Battery Failed
file systems were remounted read-only

Putty: Terminal freezes running cat on Suse Linux

For some reason that I couldn’t run cat on a file and the terminal seems frozen.
terminal_freezes
I solved this problem connecting first to a Red Hat Enterprise Linux 6 server before connecting to the Suse Linux server.

Linux EXT4-fs: error (device dm-156): ext4_lookup: deleted inode referenced: 1091357

Node : serviceguardnode2.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : minor
OM Server Time: 2016-12-22 18:22:32
Message : EXT4-fs: error (device dm-156): ext4_lookup: deleted inode referenced: 1091357
Msg Group : OS
Application : dmsg_mon
Object : EXT4
Event Type :
not_found

Instance Name :
not_found

Instruction : No

Checking which device is complaining. dm-156 is /dev/vgWPJ/lv_orawp0

root@serviceguardnode2:/dev/mapper # ls -l | grep 156
lrwxrwxrwx. 1 root root 9 Dec 14 22:15 vgWPJ-lv_orawp0 -> ../dm-156

The filesystem is currently mounted

root@serviceguardnode2:/dev/mapper # mount | grep lv_orawp0
/dev/mapper/vgWPJ-lv_orawp0 on /oracle/WPJ type ext4 (rw,errors=remount-ro,data_err=abort,barrier=0)

And the logical volume is open

root@serviceguardnode2:~ # lvs vgWPJ
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lv_ora11264 vgWPJ -wi-ao—- 30.00g
lv_orawp0 vgWPJ -wi-ao—- 5.00g

This is a clustered environment and it is currently running on the other node

root@serviceguardnode2:/dev/mapper # cmviewcl | grep -i wpj
dbWPJ up running enabled serviceguardnode1

There is a Red Hat note referencing the error – “ext4_lookup: deleted inode referenced” errors in /var/log/messages in RHEL 6.

In clustered environments, which is the case, if the other node is mounting the filesystem, it will throw these errors in /var/log/messages

root@serviceguardnode2:~ # cmviewcl -v -p dbWPJ

PACKAGE STATUS STATE AUTO_RUN NODE
dbWPJ up running enabled serviceguardnode1

Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual

Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 5 0 dbWPJmon
Subnet up 10.106.10.0

Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled serviceguardnode1 (current)
Alternate up enabled serviceguardnode2

Dependency_Parameters:
DEPENDENCY_NAME NODE_NAME SATISFIED
dbWP0_dep serviceguardnode2 no
dbWP0_dep serviceguardnode1 yes

Other_Attributes:
ATTRIBUTE_NAME ATTRIBUTE_VALUE
Style modular
Priority no_priority

Checking the filesystems. I need to unmount /oracle/WPJ but first I need to umount everything under /oracle/WPJ otherwise it will show that /oracle/WPJ is busy

root@serviceguardnode2:~ # df -hP | grep WPJ
/dev/mapper/vgSAP-lv_WPJ_sys 93M 1.6M 87M 2% /usr/sap/WPJ/SYS
/dev/mapper/vgWPJ-lv_orawp0 4.4G 162M 4.0G 4% /oracle/WPJ
/dev/mapper/vgWPJ-lv_ora11264 27G 4.7G 21G 19% /oracle/WPJ/11204
/dev/mapper/vgWPJlog2-lv_origlogb 2.0G 423M 1.4G 23% /oracle/WPJ/origlogB
/dev/mapper/vgWPJlog2-lv_mirrloga 2.0G 404M 1.5G 22% /oracle/WPJ/mirrlogA
/dev/mapper/vgWPJlog1-lv_origloga 2.0G 423M 1.4G 23% /oracle/WPJ/origlogA
/dev/mapper/vgWPJlog1-lv_mirrlogb 2.0G 404M 1.5G 22% /oracle/WPJ/mirrlogB
/dev/mapper/vgWPJdata-lv_sapdata4 75G 21G 55G 28% /oracle/WPJ/sapdata4
/dev/mapper/vgWPJdata-lv_sapdata3 75G 79M 75G 1% /oracle/WPJ/sapdata3
/dev/mapper/vgWPJdata-lv_sapdata2 75G 7.3G 68G 10% /oracle/WPJ/sapdata2
/dev/mapper/vgWPJdata-lv_sapdata1 75G 1.1G 74G 2% /oracle/WPJ/sapdata1
/dev/mapper/vgWPJoraarch-lv_oraarch 20G 234M 19G 2% /oracle/WPJ/oraarch
scsWPJ:/export/sapmnt/WPJ/profile 4.4G 4.0M 4.1G 1% /sapmnt/WPJ/profile
scsWPJ:/export/sapmnt/WPJ/exe 4.4G 2.5G 1.7G 61% /sapmnt/WPJ/exe

Umounting /oracle/WPJ

root@serviceguardnode2:~ # umount /oracle/WPJ/11204
root@serviceguardnode2:~ # umount /oracle/WPJ/origlogB
root@serviceguardnode2:~ # umount /oracle/WPJ/mirrlogA
root@serviceguardnode2:~ # umount /oracle/WPJ/origlogA
root@serviceguardnode2:~ # umount /oracle/WPJ/mirrlogB
root@serviceguardnode2:~ # umount /oracle/WPJ/sapdata4
root@serviceguardnode2:~ # umount /oracle/WPJ/sapdata3
root@serviceguardnode2:~ # umount /oracle/WPJ/sapdata2
root@serviceguardnode2:~ # umount /oracle/WPJ/sapdata1
root@serviceguardnode2:~ # umount /oracle/WPJ/oraarch
root@serviceguardnode2:~ # umount /oracle/WPJ

Linux LVM: Couldn’t find device with uuid unHhGy-Fg3A-Y8wU-PrWh-hwWx-Ki0R-D6Qasc

On this server, any command that uses LVM returns an error message complaining about a missing disk

root@linux:~ # pvs
Couldn’t find device with uuid unHhGy-Fg3A-Y8wU-PrWh-hwWx-Ki0R-D6Qasc.
WARNING: Inconsistent metadata found for VG oraclevg – updating to use version 24
PV VG Fmt Attr PSize PFree
/dev/mapper/crashvgp1 oraclevg lvm2 a–u 99.98g 99.98g
/dev/mapper/mpathbp1 oraclevg lvm2 a–u 299.96g 299.96g
/dev/mapper/oraclevg_1p1 oraclevg lvm2 a–u 99.98g 0
/dev/mapper/oraclevg_2p1 oraclevg lvm2 a–u 49.98g 0
/dev/sda2 rootvg lvm2 a–u 279.12g 143.62g
unknown device oraclevg lvm2 a-mu 49.98g 49.98g

Volume group oraclevg is showing duplicate

root@linux:~ # vgs -v
Using volume group(s) on command line.
Cache: Duplicate VG name oraclevg: Existing 5Rxet9-eL9E-8hFU-8m98-pVLh-gZMD-e4vZBT (created here) takes precedence over R8fkNM-1vrs-S4DF-reUZ-1pts-zhxk-EHVT1K
Archiving volume group “oraclevg” metadata (seqno 33).
Archiving volume group “oraclevg” metadata (seqno 3).
Creating volume group backup “/etc/lvm/backup/oraclevg” (seqno 3).
Couldn’t find device with uuid unHhGy-Fg3A-Y8wU-PrWh-hwWx-Ki0R-D6Qasc.
Couldn’t find device with uuid unHhGy-Fg3A-Y8wU-PrWh-hwWx-Ki0R-D6Qasc.
Couldn’t find device with uuid unHhGy-Fg3A-Y8wU-PrWh-hwWx-Ki0R-D6Qasc.
Couldn’t find device with uuid unHhGy-Fg3A-Y8wU-PrWh-hwWx-Ki0R-D6Qasc.
WARNING: Inconsistent metadata found for VG oraclevg – updating to use version 34
There are 1 physical volumes missing.
There are 1 physical volumes missing.
Archiving volume group “oraclevg” metadata (seqno 3).
Archiving volume group “oraclevg” metadata (seqno 35).
Creating volume group backup “/etc/lvm/backup/oraclevg” (seqno 35).
VG Attr Ext #PV #LV #SN VSize VFree VG UUID VProfile
oraclevg wz–n- 4.00m 2 2 0 149.96g 0 R8fkNM-1vrs-S4DF-reUZ-1pts-zhxk-EHVT1K
oraclevg wz-pn- 4.00m 3 0 0 449.93g 449.93g 5Rxet9-eL9E-8hFU-8m98-pVLh-gZMD-e4vZBT
rootvg wz–n- 4.00m 1 10 0 279.12g 143.62g 685XSf-7Dsf-76oL-5pp7-t27Z-nT1o-dqXuUB

To view the properties of a specific volume group use –select vg_uuid and inform the UUID gathered from the previous command

root@linux:~ # vgdisplay -v –select vg_uuid=5Rxet9-eL9E-8hFU-8m98-pVLh-gZMD-e4vZBT
Using volume group(s) on command line.
Cache: Duplicate VG name oraclevg: Existing 5Rxet9-eL9E-8hFU-8m98-pVLh-gZMD-e4vZBT (created here) takes precedence over R8fkNM-1vrs-S4DF-reUZ-1pts-zhxk-EHVT1K
Archiving volume group “oraclevg” metadata (seqno 53).
Archiving volume group “oraclevg” metadata (seqno 3).
Creating volume group backup “/etc/lvm/backup/oraclevg” (seqno 3).
Couldn’t find device with uuid unHhGy-Fg3A-Y8wU-PrWh-hwWx-Ki0R-D6Qasc.
There are 1 physical volumes missing.
There are 1 physical volumes missing.
Archiving volume group “oraclevg” metadata (seqno 3).
Archiving volume group “oraclevg” metadata (seqno 53).
Creating volume group backup “/etc/lvm/backup/oraclevg” (seqno 53).
— Volume group —
VG Name oraclevg
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 53
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 0
Open LV 0
Max PV 0
Cur PV 3
Act PV 2
VG Size 449.93 GiB
PE Size 4.00 MiB
Total PE 115181
Alloc PE / Size 0 / 0
Free PE / Size 115181 / 449.93 GiB
VG UUID 5Rxet9-eL9E-8hFU-8m98-pVLh-gZMD-e4vZBT

— Physical volumes —
PV Name /dev/mapper/crashvgp1
PV UUID Q8XgjC-wgao-uABU-6o39-9SVO-DSwE-zFcTSb
PV Status allocatable
Total PE / Free PE 25595 / 25595

PV Name unknown device
PV UUID unHhGy-Fg3A-Y8wU-PrWh-hwWx-Ki0R-D6Qasc
PV Status allocatable
Total PE / Free PE 12795 / 12795

PV Name /dev/mapper/mpathbp1
PV UUID IMYMJx-H5xY-d16M-M63Q-1lHt-4oLN-xtzoeJ
PV Status allocatable
Total PE / Free PE 76791 / 76791

Many LVM command can be run with –select vg_uuid

root@linux:~ # vgchange -a n –select vg_uuid=5Rxet9-eL9E-8hFU-8m98-pVLh-gZMD-e4vZBT
WARNING: Inconsistent metadata found for VG oraclevg – updating to use version 54
Volume group “oraclevg” successfully changed
0 logical volume(s) in volume group “oraclevg” now active

I am removing oraclevg that is missing a physical volume and forcing the removal

root@linux:~ # vgremove –select vg_uuid=5Rxet9-eL9E-8hFU-8m98-pVLh-gZMD-e4vZBT -f
Volume group “oraclevg” successfully removed

Running vgs -v doesn’t show duplicate anymore

root@linux:~ # vgs -v
Using volume group(s) on command line.
Archiving volume group “oraclevg” metadata (seqno 3).
Creating volume group backup “/etc/lvm/backup/oraclevg” (seqno 3).
VG Attr Ext #PV #LV #SN VSize VFree VG UUID VProfile
oraclevg wz–n- 4.00m 2 2 0 149.96g 0 R8fkNM-1vrs-S4DF-reUZ-1pts-zhxk-EHVT1K
rootvg wz–n- 4.00m 1 10 0 279.12g 143.62g 685XSf-7Dsf-76oL-5pp7-t27Z-nT1o-dqXuUB