Advertisements

Tag Archives: scstat

Solaris Cluster resource group is online but resource failed to start

Received the following error

root@solaris_node1:/ # scswitch -z -g dbs-oracle-pbr070-rg -h sc03-app02:solaris_node1
scswitch: (C450598) On node sc03-app02:solaris_node1, resource group dbs-oracle-pbr070-rg is online but resource lsn-orapbr070-01-res failed to start

Listener didn’t start

root@solaris_node1:~ # scstat -g | grep pbr070
Resources: dbs-oracle-pbr070-rg lhn-orapbr070-01-res lsn-orapbr070-01-res dbs-orapbr070-01-res zfs-orapbr070-01-res
Group: dbs-oracle-pbr070-rg sc03-app02:solaris_node1 Online faulted No
Group: dbs-oracle-pbr070-rg sc03-app04:solaris_node2 Offline No
Resource: lhn-orapbr070-01-res sc03-app02:solaris_node1 Online Online – LogicalHostname online.
Resource: lhn-orapbr070-01-res sc03-app04:solaris_node2 Offline Offline – LogicalHostname offline.
Resource: lsn-orapbr070-01-res sc03-app02:solaris_node1 Start failed Faulted
Resource: lsn-orapbr070-01-res sc03-app04:solaris_node2 Offline Offline
Resource: dbs-orapbr070-01-res sc03-app02:solaris_node1 Online Online
Resource: dbs-orapbr070-01-res sc03-app04:solaris_node2 Offline Offline
Resource: zfs-orapbr070-01-res sc03-app02:solaris_node1 Online Online
Resource: zfs-orapbr070-01-res sc03-app04:solaris_node2 Offline Offline

Stopping the individual resource

root@solaris_node1:~ # scswitch -n -j lsn-orapbr070-01-res

root@solaris_node1:~ # scstat -g | grep pbr070
Resources: dbs-oracle-pbr070-rg lhn-orapbr070-01-res lsn-orapbr070-01-res dbs-orapbr070-01-res zfs-orapbr070-01-res
Group: dbs-oracle-pbr070-rg sc03-app02:solaris_node1 Online No
Group: dbs-oracle-pbr070-rg sc03-app04:solaris_node2 Offline No
Resource: lhn-orapbr070-01-res sc03-app02:solaris_node1 Online Online – LogicalHostname online.
Resource: lhn-orapbr070-01-res sc03-app04:solaris_node2 Offline Offline – LogicalHostname offline.
Resource: lsn-orapbr070-01-res sc03-app02:solaris_node1 Offline Offline
Resource: lsn-orapbr070-01-res sc03-app04:solaris_node2 Offline Offline
Resource: dbs-orapbr070-01-res sc03-app02:solaris_node1 Online Online
Resource: dbs-orapbr070-01-res sc03-app04:solaris_node2 Offline Offline
Resource: zfs-orapbr070-01-res sc03-app02:solaris_node1 Online Online
Resource: zfs-orapbr070-01-res sc03-app04:solaris_node2 Offline Offline

After starting it again, received the same error

root@solaris_node1:~ # scswitch -e -j lsn-orapbr070-01-res
scswitch: (C450598) On node sc03-app02:solaris_node1, resource group dbs-oracle-pbr070-rg is online but resource lsn-orapbr070-01-res failed to start

root@solaris_node1:~ # scstat -g | grep pbr070
Resources: dbs-oracle-pbr070-rg lhn-orapbr070-01-res lsn-orapbr070-01-res dbs-orapbr070-01-res zfs-orapbr070-01-res
Group: dbs-oracle-pbr070-rg sc03-app02:solaris_node1 Online faulted No
Group: dbs-oracle-pbr070-rg sc03-app04:solaris_node2 Offline No
Resource: lhn-orapbr070-01-res sc03-app02:solaris_node1 Online Online – LogicalHostname online.
Resource: lhn-orapbr070-01-res sc03-app04:solaris_node2 Offline Offline – LogicalHostname offline.
Resource: lsn-orapbr070-01-res sc03-app02:solaris_node1 Start failed Faulted
Resource: lsn-orapbr070-01-res sc03-app04:solaris_node2 Offline Offline
Resource: dbs-orapbr070-01-res sc03-app02:solaris_node1 Online Online
Resource: dbs-orapbr070-01-res sc03-app04:solaris_node2 Offline Offline
Resource: zfs-orapbr070-01-res sc03-app02:solaris_node1 Online Online
Resource: zfs-orapbr070-01-res sc03-app04:solaris_node2 Offline Offline

Asked DBA to start the listener manually. There was another listener running on the host with the same name. DBA stopped and started the resource by cluster

Advertisements

Sun Cluster Resource Start failed Faulted

Checking status I see the resource lsn-orapvtl-01-res running on solaris10node2 had a failure

root@solaris10node1:~ # scstat -g | grep pvtl
Resources: dbs-oracle-pvtl-rg ddg-arcpvtl-01-res ddg-orapvtl-01-res ddg-cmpvtl-01-res lhn-orapvtl-01-res lsn-orapvtl-01-res dbs-orapvtl-01-res
Group: dbs-oracle-pvtl-rg solaris10node2 Online faulted No
Group: dbs-oracle-pvtl-rg solaris10node1 Offline No
Resource: ddg-arcpvtl-01-res solaris10node2 Online Online
Resource: ddg-arcpvtl-01-res solaris10node1 Offline Offline
Resource: ddg-orapvtl-01-res solaris10node2 Online Online
Resource: ddg-orapvtl-01-res solaris10node1 Offline Offline
Resource: ddg-cmpvtl-01-res solaris10node2 Online Online
Resource: ddg-cmpvtl-01-res solaris10node1 Offline Offline
Resource: lhn-orapvtl-01-res solaris10node2 Online Online – LogicalHostname online.
Resource: lhn-orapvtl-01-res solaris10node1 Offline Offline
Resource: lsn-orapvtl-01-res solaris10node2 Start failed Faulted
Resource: lsn-orapvtl-01-res solaris10node1 Offline Offline
Resource: dbs-orapvtl-01-res solaris10node2 Online Online
Resource: dbs-orapvtl-01-res solaris10node1 Offline Offline

I stopped the resource

root@solaris10:~ # scswitch -n -j lsn-orapvtl-01-res

root@solaris10node1:~ # scstat -g | grep pvtl
Resources: dbs-oracle-pvtl-rg ddg-arcpvtl-01-res ddg-orapvtl-01-res ddg-cmpvtl-01-res lhn-orapvtl-01-res lsn-orapvtl-01-res dbs-orapvtl-01-res
Group: dbs-oracle-pvtl-rg solaris10node2 Online No
Group: dbs-oracle-pvtl-rg solaris10node1 Offline No
Resource: ddg-arcpvtl-01-res solaris10node2 Online Online
Resource: ddg-arcpvtl-01-res solaris10node1 Offline Offline
Resource: ddg-orapvtl-01-res solaris10node2 Online Online
Resource: ddg-orapvtl-01-res solaris10node1 Offline Offline
Resource: ddg-cmpvtl-01-res solaris10node2 Online Online
Resource: ddg-cmpvtl-01-res solaris10node1 Offline Offline
Resource: lhn-orapvtl-01-res solaris10node2 Online Online – LogicalHostname online.
Resource: lhn-orapvtl-01-res solaris10node1 Offline Offline
Resource: lsn-orapvtl-01-res solaris10node2 Offline Offline
Resource: lsn-orapvtl-01-res solaris10node1 Offline Offline
Resource: dbs-orapvtl-01-res solaris10node2 Online Online
Resource: dbs-orapvtl-01-res solaris10node1 Offline Offline

Then started it

root@solaris10:~ # scswitch -e -j lsn-orapvtl-01-res

root@solaris10~ # scstat -g | grep pvtl
Resources: dbs-oracle-pvtl-rg ddg-arcpvtl-01-res ddg-orapvtl-01-res ddg-cmpvtl-01-res lhn-orapvtl-01-res lsn-orapvtl-01-res dbs-orapvtl-01-res
Group: dbs-oracle-pvtl-rg solaris10node2 Online No
Group: dbs-oracle-pvtl-rg solaris10node1 Offline No
Resource: ddg-arcpvtl-01-res solaris10node2 Online Online
Resource: ddg-arcpvtl-01-res solaris10node1 Offline Offline
Resource: ddg-orapvtl-01-res solaris10node2 Online Online
Resource: ddg-orapvtl-01-res solaris10node1 Offline Offline
Resource: ddg-cmpvtl-01-res solaris10node2 Online Online
Resource: ddg-cmpvtl-01-res solaris10node1 Offline Offline
Resource: lhn-orapvtl-01-res solaris10node2 Online Online – LogicalHostname online.
Resource: lhn-orapvtl-01-res solaris10node1 Offline Offline
Resource: lsn-orapvtl-01-res solaris10node2 Online Online
Resource: lsn-orapvtl-01-res solaris10node1 Offline Offline
Resource: dbs-orapvtl-01-res solaris10node2 Online Online
Resource: dbs-orapvtl-01-res solaris10node1 Offline Offline

scstat libsecurity: create of rpc handle to program rgmd_receptionist (100141) failed, will not retry

root@solaris10:/ # scstat -g
libsecurity: create of rpc handle to program rgmd_receptionist (100141) failed, will not retry
scstat: unexpected error.

I’m running a Sun Cluster 3.1 that gave me the error about libsecurity

root@solaris10:/ # pkginfo -l SUNWscnmr
PKGINST: SUNWscnmr
NAME: Sun Cluster name (Root)
CATEGORY: application
ARCH: sparc
VERSION: 3.1.0,REV=2005.07.18.14.37
BASEDIR: /
VENDOR: Sun Microsystems, Inc.
DESC: Sun Cluster name (Root)
PSTAMP: 07/18/2005.14:43:46
INSTDATE: Oct 25 2008 20:23
HOTLINE: Please contact your local service provider
STATUS: completely installed
FILES: 3 installed pathnames
2 shared pathnames
2 directories
1 blocks used (approx)

They were complaining that the server was unavailable but I was able to login so I took a look at the runlevel

root@solaris10:/ # who -r
. run-level 3 Oct 6 13:20 3 0 S

Running svcs -xv I saw numerous services that haven’t started yet

root@solaris10:/ # svcs -xv
svc:/milestone/multi-user:default (multi-user milestone)
State: offline since Tue 06 Oct 2015 01:23:54 PM BRT
Reason: Start method is running.
See: http://sun.com/msg/SMF-8000-C4
See: man -M /usr/share/man -s 1M init
See: /var/svc/log/milestone-multi-user:default.log
Impact: 23 dependent services are not running:
svc:/system/webconsole:console
svc:/system/boot-config:default
svc:/application/stosreg:default
svc:/application/sthwreg:default
svc:/application/management/common-agent-container-1:default
svc:/system/cluster/cl-svc-enable:default
svc:/system/cluster/spm:default
svc:/system/cluster/cl-svc-cluster-milestone:default
svc:/system/cluster/scdpm:default
svc:/system/cluster/rpc-pmf:default
svc:/system/cluster/rgm:default
svc:/system/cluster/scsymon-srv:default
svc:/system/cluster/rpc-fed:default
svc:/system/cluster/pnm:default
svc:/system/cluster/cl-event:default
svc:/system/cluster/cl-eventlog:default
svc:/system/cluster/cl-ccra:default
svc:/milestone/multi-user-server:default
svc:/system/basicreg:default
svc:/system/zones:default
svc:/application/graphical-login/cde-login:default
svc:/system/vxvm/vxvm-recover:default
svc:/application/cde-printinfo:default

svc:/application/print/ipp-listener:default (Internet Print Protocol Listening Service)
State: maintenance since Tue 06 Oct 2015 01:22:52 PM BRT
Reason: Start method died on Killed (9).
See: http://sun.com/msg/SMF-8000-KS
See: man -M /usr/share/man -s 4 mod_ipp
See: /var/svc/log/application-print-ipp-listener:default.log
Impact: This service is not running.

Long time passed but didn’t start. Looking at the Fault Management Facility, it was executing a script. Killed the script and the services started being processed

root@solaris10:/ # cat /var/svc/log/milestone-multi-user:default.log
Executing legacy init script “/etc/rc2.d/S74osddownt”.
OSD DownTime is being started.

%d bloggers like this: