Author: Emerson .

Suse Linux 9 boot – cciss: cmd f6b80498 has CHECK CONDITION byte 2 = 0x3

cciss
At startup I was receiving this message

cciss: cmd f6b80498 has CHECK CONDITION byte 2 = 0x3
cciss: cmd f6b80000 has CHECK CONDITION byte 2 = 0x3

I found that this is a media error. Since I rebooted because the filesystem / became read-only that made sense.

I ended up reinstalling this server

scstat libsecurity: create of rpc handle to program rgmd_receptionist (100141) failed, will not retry

root@solaris10:/ # scstat -g
libsecurity: create of rpc handle to program rgmd_receptionist (100141) failed, will not retry
scstat: unexpected error.

I’m running a Sun Cluster 3.1 that gave me the error about libsecurity

root@solaris10:/ # pkginfo -l SUNWscnmr
PKGINST: SUNWscnmr
NAME: Sun Cluster name (Root)
CATEGORY: application
ARCH: sparc
VERSION: 3.1.0,REV=2005.07.18.14.37
BASEDIR: /
VENDOR: Sun Microsystems, Inc.
DESC: Sun Cluster name (Root)
PSTAMP: 07/18/2005.14:43:46
INSTDATE: Oct 25 2008 20:23
HOTLINE: Please contact your local service provider
STATUS: completely installed
FILES: 3 installed pathnames
2 shared pathnames
2 directories
1 blocks used (approx)

They were complaining that the server was unavailable but I was able to login so I took a look at the runlevel

root@solaris10:/ # who -r
. run-level 3 Oct 6 13:20 3 0 S

Running svcs -xv I saw numerous services that haven’t started yet

root@solaris10:/ # svcs -xv
svc:/milestone/multi-user:default (multi-user milestone)
State: offline since Tue 06 Oct 2015 01:23:54 PM BRT
Reason: Start method is running.
See: http://sun.com/msg/SMF-8000-C4
See: man -M /usr/share/man -s 1M init
See: /var/svc/log/milestone-multi-user:default.log
Impact: 23 dependent services are not running:
svc:/system/webconsole:console
svc:/system/boot-config:default
svc:/application/stosreg:default
svc:/application/sthwreg:default
svc:/application/management/common-agent-container-1:default
svc:/system/cluster/cl-svc-enable:default
svc:/system/cluster/spm:default
svc:/system/cluster/cl-svc-cluster-milestone:default
svc:/system/cluster/scdpm:default
svc:/system/cluster/rpc-pmf:default
svc:/system/cluster/rgm:default
svc:/system/cluster/scsymon-srv:default
svc:/system/cluster/rpc-fed:default
svc:/system/cluster/pnm:default
svc:/system/cluster/cl-event:default
svc:/system/cluster/cl-eventlog:default
svc:/system/cluster/cl-ccra:default
svc:/milestone/multi-user-server:default
svc:/system/basicreg:default
svc:/system/zones:default
svc:/application/graphical-login/cde-login:default
svc:/system/vxvm/vxvm-recover:default
svc:/application/cde-printinfo:default

svc:/application/print/ipp-listener:default (Internet Print Protocol Listening Service)
State: maintenance since Tue 06 Oct 2015 01:22:52 PM BRT
Reason: Start method died on Killed (9).
See: http://sun.com/msg/SMF-8000-KS
See: man -M /usr/share/man -s 4 mod_ipp
See: /var/svc/log/application-print-ipp-listener:default.log
Impact: This service is not running.

Long time passed but didn’t start. Looking at the Fault Management Facility, it was executing a script. Killed the script and the services started being processed

root@solaris10:/ # cat /var/svc/log/milestone-multi-user:default.log
Executing legacy init script “/etc/rc2.d/S74osddownt”.
OSD DownTime is being started.

Checking AIX HBA link

lsattr -El fscsi. Look for the attach parameter

root@aix:/ # lsattr -El fscsi1
attach none How this adapter is CONNECTED False
dyntrk no Dynamic Tracking of FC Devices True
fc_err_recov delayed_fail FC Fabric Event Error RECOVERY Policy True
scsi_id Adapter SCSI ID False
sw_fc_class 3 FC Class for Fabric True
root@aix:/ # lsattr -El fscsi3
attach al How this adapter is CONNECTED False
dyntrk no Dynamic Tracking of FC Devices True
fc_err_recov delayed_fail FC Fabric Event Error RECOVERY Policy True
scsi_id 0x1 Adapter SCSI ID False
sw_fc_class 3 FC Class for Fabric True

The adapter fcs2 I know that is connected

root@aix:/ # lsattr -El fscsi2
attach switch How this adapter is CONNECTED False
dyntrk no Dynamic Tracking of FC Devices True
fc_err_recov delayed_fail FC Fabric Event Error RECOVERY Policy True
scsi_id 0x2bd00 Adapter SCSI ID False
sw_fc_class 3 FC Class for Fabric True

After plugging the fibre, I ran cfgmgr to recognize the hardware

root@aix:/ # cfgmgr

And runnning again lsattr -El fscsi I see that the attach parameter changed to switch

root@aix:/ # lsattr -El fscsi1
attach switch How this adapter is CONNECTED False
dyntrk no Dynamic Tracking of FC Devices True
fc_err_recov delayed_fail FC Fabric Event Error RECOVERY Policy True
scsi_id 0x14fdc0 Adapter SCSI ID False
sw_fc_class 3 FC Class for Fabric True
root@aix:/ # lsattr -El fscsi3
attach al How this adapter is CONNECTED False
dyntrk no Dynamic Tracking of FC Devices True
fc_err_recov delayed_fail FC Fabric Event Error RECOVERY Policy True
scsi_id 0x1 Adapter SCSI ID False
sw_fc_class 3 FC Class for Fabric True

root@aix:/ # fcstat fcs1

FIBRE CHANNEL STATISTICS REPORT: fcs1

Device Type: 4Gb FC PCI Express Adapter (df1000fe) (adapter/pciex/df1000fe)
Serial Number: 1C91308281
Option ROM Version: 02E8277F
ZA: Z1F2.70A5
World Wide Node Name: 0x20000000C986EF2F
World Wide Port Name: 0x10000000C986EF2F

FC-4 TYPES:
Supported: 0x0000012000000000000000000000000000000000000000000000000000000000
Active: 0x0000010000000000000000000000000000000000000000000000000000000000
Class of Service: 3
Port Speed (supported): 4 GBIT
Port Speed (running): 4 GBIT
Port FC ID: 0x28af80
Port Type: Fabric
Attention Type: Link Up
Topology: Point to Point or Fabric

Seconds Since Last Reset: 11519187

Transmit Statistics Receive Statistics
——————- ——————
Frames: 4294967295 4294967295
Words: 1099511627520 1099511627520

LIP Count: 0
NOS Count: 0
Error Frames: 0
Dumped Frames: 0
Link Failure Count: 1
Loss of Sync Count: 1
Loss of Signal: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 3
Invalid CRC Count: 0

IP over FC Adapter Driver Information
No DMA Resource Count: 0
No Adapter Elements Count: 0

FC SCSI Adapter Driver Information
No DMA Resource Count: 0
No Adapter Elements Count: 0
No Command Resource Count: 0

IP over FC Traffic Statistics
Input Requests: 0
Output Requests: 0
Control Requests: 0
Input Bytes: 0
Output Bytes: 0

FC SCSI Traffic Statistics
Input Requests: 614967916
Output Requests: 1227673574
Control Requests: 1180997
Input Bytes: 160727856996239
Output Bytes: 321763800652003

HPOM – EXT4-fs: (dm-9): barriers disabled

Node: linux.setaoffice.com
Node Type : Intel/AMD x64(HTTPS)
Severity : minor
OM Server Time: 2015-09-28 08:42:14
Message : EXT4-fs: (dm-9): barriers disabled
Msg Group : OS
Application : dmsg_mon
Object : EXT4
Event Type :
not_found

Instance Name :
not_found

Instruction : No

I mounted a new filesystem that I just set up with nobarrier parameter in /etc/fstab. It logged the following message on /var/log/messages

root@linux:~ # grep barrier /var/log/messages
Sep 28 09:39:11 linux kernel: EXT4-fs (dm-9): barriers disabled

RHEL 6 Storage Administration Guide – 22.2. Enabling/Disabling Write Barriers

I removed all the references to nobarrier parameter

root@linux:~ # cat /etc/fstab
#
# /etc/fstab
# Created by anaconda on Mon Jun 24 13:57:59 2013
#
# Accessible filesystems, by reference, are maintained under ‘/dev/disk’
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/rootvg-rootlv / ext4 defaults,nobarrier 1 1
/dev/mapper/rootvg-auditlv /audit ext4 defaults,nobarrier 1 2
UUID=4fdc0716-230e-4c5d-a3fe-ba2196bf6c21 /boot ext3 defaults 1 2
/dev/mapper/rootvg-optlv /opt ext4 defaults,nobarrier 1 2
/dev/mapper/rootvg-tmplv /tmp ext4 defaults,nobarrier 1 2
/dev/mapper/rootvg-usrlv /usr ext4 defaults,nobarrier 1 2
/dev/mapper/rootvg-userslv /usr/users ext4 defaults,nobarrier 1 2
/dev/mapper/rootvg-varlv /var ext4 defaults,nobarrier 1 2
/dev/mapper/rootvg-crashlv /var/crash ext3 defaults,nobarrier 1 2
/dev/mapper/rootvg-swaplv swap swap defaults 0 0
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/mapper/rootvg-repolv /repo ext4 defaults,nobarrier 1 2

AIX lspath Missing path

Reviewing disk paths, I found some paths showing as Missing and some enabled on the same fscsi

root@aix:/ # lspath -l hdisk31
Enabled hdisk31 fscsi0
Missing hdisk31 fscsi0
Enabled hdisk31 fscsi0
Missing hdisk31 fscsi0
Enabled hdisk31 fscsi2
Missing hdisk31 fscsi2
Enabled hdisk31 fscsi2
Missing hdisk31 fscsi2

To correct this situation we will remove the path and recognize again. Check if there is a disk with only one path

root@aix:/ # lspath

Remove one path, run cfgmgr. Remove the other one and run cfgmgr

root@aix:/ # rmpath -p fscsi2 -d
root@aix:/ # cfgmgr
root@aix:/ # rmpath -p fscsi0 -d
root@aix:/ # cfgmgr

It should display like this

root@aix:/ # lspath -l hdisk31
Enabled hdisk31 fscsi0
Enabled hdisk31 fscsi0
Enabled hdisk31 fscsi2
Enabled hdisk31 fscsi2

Solaris 10 – passwd: password is based on a dictionary word.

root@solaris10:/ # passwd emerson
New Password:
passwd: password is based on a dictionary word.

Edit file /etc/default/passwd and comment DICTIONLIST and DICTIONDBDIR

root@solaris10:/ # vi /etc/default/passwd
#DICTIONLIST=/usr/share/lib/dict/words
#DICTIONDBDIR=/var/passwd

Source: On passwords – Part 3: Using a password policy

Solaris 10: emlxs0:WARNING:1540: Firmware update required. (A manual HBA reset or link reset (using luxadm or fcadm) is required

If you are seeing these messages requiring a HBA reset on /var/adm/messages

root@solaris10:/ # tail /var/adm/messages
Sep 3 05:49:32 solaris10 scsi: [ID 243001 kern.info] /pci@0,600000/pci@0/scsi@1 (mpt0):
Sep 3 05:49:32 solaris10 mpt_get_sas_device_page0 config: IOCStatus=0x22 IOCLogInfo=0x30030501
Sep 3 09:24:06 solaris10 emlxs: [ID 349649 kern.info] [ 1.0340]emlxs7:WARNING:1540: Firmware update required. (A manual HBA reset or link reset (using luxadm or fcadm) is required.)
Sep 3 09:24:15 solaris10 emlxs: [ID 349649 kern.info] [ 1.0340]emlxs5:WARNING:1540: Firmware update required. (A manual HBA reset or link reset (using luxadm or fcadm) is required.)
Sep 3 09:24:25 solaris10 emlxs: [ID 349649 kern.info] [ 1.0340]emlxs3:WARNING:1540: Firmware update required. (A manual HBA reset or link reset (using luxadm or fcadm) is required.)
Sep 3 09:24:28 solaris10 emlxs: [ID 349649 kern.info] [ 1.0340]emlxs1:WARNING:1540: Firmware update required. (A manual HBA reset or link reset (using luxadm or fcadm) is required.)
Sep 3 09:24:57 solaris10 emlxs: [ID 349649 kern.info] [ 1.0340]emlxs6:WARNING:1540: Firmware update required. (A manual HBA reset or link reset (using luxadm or fcadm) is required.)
Sep 3 09:24:58 solaris10 emlxs: [ID 349649 kern.info] [ 1.0340]emlxs4:WARNING:1540: Firmware update required. (A manual HBA reset or link reset (using luxadm or fcadm) is required.)
Sep 3 09:24:59 solaris10 emlxs: [ID 349649 kern.info] [ 1.0340]emlxs2:WARNING:1540: Firmware update required. (A manual HBA reset or link reset (using luxadm or fcadm) is required.)
Sep 3 09:25:00 solaris10 emlxs: [ID 349649 kern.info] [ 1.0340]emlxs0:WARNING:1540: Firmware update required. (A manual HBA reset or link reset (using luxadm or fcadm) is required.)

List all the HBA ports

root@solaris10:/ # fcinfo hba-port | grep Device
OS Device Name: /dev/cfg/c11
OS Device Name: /dev/cfg/c12
OS Device Name: /dev/cfg/c9
OS Device Name: /dev/cfg/c10
OS Device Name: /dev/cfg/c4
OS Device Name: /dev/cfg/c5
OS Device Name: /dev/cfg/c2
OS Device Name: /dev/cfg/c3

And then run luxadm -e forcelip on the OS Device Name and wait 60 seconds

root@solaris10:/ # luxadm -e forcelip /dev/cfg/c11; sleep 60
root@solaris10:/ # luxadm -e forcelip /dev/cfg/c12; sleep 60
root@solaris10:/ # luxadm -e forcelip /dev/cfg/c9; sleep 60
root@solaris10:/ # luxadm -e forcelip /dev/cfg/c10; sleep 60
root@solaris10:/ # luxadm -e forcelip /dev/cfg/c4; sleep 60
root@solaris10:/ # luxadm -e forcelip /dev/cfg/c5; sleep 60
root@solaris10:/ # luxadm -e forcelip /dev/cfg/c2; sleep 60
root@solaris10:/ # luxadm -e forcelip /dev/cfg/c3; sleep 60

Remove the graphical screen with progress bar in Red Hat Enterprise Linux and Suse

Edit file /boot/grub/grub.conf or /boot/grub/menu.lst and on the line that contains the word kernel, remove the word splash or rhgb (Red Hat Graphical Boot) from this line.

You will boot in text mode until it is time to transition to graphical mode otherwise you will remain on text mode.

Use this to check all console messages when booting. Useful if you need to connect remotely and pressing Esc or F2 fails

root@suse9:~ # cat /boot/grub/menu.lst
# Modified by YaST2. Last modification on Mon Aug 27 18:45:43 2007

color white/blue black/light-gray
default 0
timeout 8
gfxmenu (hd0,2)/message

###Don’t change this comment – YaST2 identifier: Original name: linux###
title Linux
kernel (hd0,2)/vmlinuz root=6801 vga=0x317 selinux=0 resume=/dev/cciss/c0d0p2 elevator=cfq showopts
initrd (hd0,2)/initrd

root@rhel66:~ # cat /boot/grub/grub.conf
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You have a /boot partition. This means that
# all kernel and initrd paths are relative to /boot/, eg.
# root (hd0,0)
# kernel /vmlinuz-version ro root=/dev/mapper/rootvg-rootlv
# initrd /initrd-[generic-]version.img
#boot=/dev/sda
default=0
timeout=5
#splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux Server (2.6.32-573.1.1.el6.x86_64)
root (hd0,0)
kernel /vmlinuz-2.6.32-573.1.1.el6.x86_64 ro root=/dev/mapper/rootvg-rootlv rd_NO_LUKS rd_LVM_LV=rootvg/rootlv KEYBOARDTYPE=pc KEYTABLE=br-abnt2 rd_NO_MD rd_LVM_LV=rootvg/swaplv SYSFONT=latarcyrheb-sun16 crashkernel=auto console=tty1 console=ttyS1,115200 noquiet log_buf_len=3M elevator=deadline nmi_watchdog=0 rd_NO_DM LANG=en_US.UTF-8
initrd /initramfs-2.6.32-573.1.1.el6.x86_64.img

If you need to check, boot the kernel using the option splash=off or remove rhgb. You also can use consoleblank=0 to not turn off the screen.

Sun Fire E25K domain In Recovery status. How to check if it is progressing

You ran showplatform in the system controller and the E25k domain shows the status In Recovery

Domain configurations:
======================
Domain ID Domain Tag Solaris Nodename Domain Status
A – – Powered Off
B – – Powered Off
C – – Powered Off
D – – Powered Off
E domain5 – In Recovery
F – – Powered Off
G – – Powered Off
H – – Powered Off
I – – Powered Off
J – – Powered Off
K – – Powered Off
L – – Powered Off
M – – Powered Off
N – – Powered Off
O – – Powered Off
P – – Powered Off
Q – – Powered Off
R – – Powered Off

To check the progress of why it is taking a long time to boot the domain, check the post process running

roo@systemcontrollern1:~ # ps -ef | grep post
sms-svc 5398 198 0 15:48:02 pts/2 0:00 grep post
sms-dsmd 27458 26817 39 15:34:33 ? 3:14 /opt/SUNWSMS/SMS1.6/bin/hpost -d E -Q

roo@systemcontrollern1:~ # truss -p 27458

Failure: The event flow is broken on solaris.setaoffice.com for the last 60min. Please follow the instructions.

ATTENTION, RMC LEVEL 1 AGENT: This ticket will be automatically worked by the Automation Bus. Pls do not take ownership until further notice.
Node : solaris.setaoffice.com
Node Type : Sun SPARC (HTTPS)
Severity : major
OM Server Time: 2015-04-01 14:37:24
Message : Failure: The event flow is broken on solaris.setaoffice.com for the last 60min. Please follow the instructions.
Msg Group : ITO
Application : HealthCheck
Object : OVO-agent
Event Type :
not_found

Instance Name :
not_found

Instruction : (Please carry out instructions in order and record output in ticket)

1) check if there is any maintenance ongoing for the respective system. Set an scheduled outage if yes.

2) check if the system is reachable – login to the server in question and ping the OVO management server. if not pingable, inform the second line or technical lead

3) if the system is reachable, generate a test alert on the node in question.

4) if the test alert is not received, do opcagt -kill; then remove temp queue files (/var/opt/OV/tmp/OpC/*q on Unix or on windows,
…\tmp\OpC\*q); then do opcagt -start on the system. Generate another test alert on the node in question.

5) if the the test alert is not received, refer the call to OVO monitoring support team.

Check which host is the HPOM manager and try to ping it

root@solaris:/ # /opt/OV/bin/ovconfget | grep OPC_PRIMARY_MGR
OPC_PRIMARY_MGR=hpommanager.omc.hp.com

root@solaris:/ # ping hpommanager.omc.hp.com
hpommanager.omc.hp.com is alive

Try also to use the tool bbcutil and check the status. If everything is also okay, the manager is having trouble reaching the managed host.

root@solaris:/ # bbcutil -ping https://hpommanager.omc.hp.com

https://hpommanager.omc.hp.com: status=eServiceOK
coreID=d2ebdec9-48ff-40ec-bf76-eb233981c3a0
bbcV=11.14.014 appN=ovbbccb appV=unknown version
conn=9 time=1199 ms