Advertisements

Category Archives: Solaris

HPOM – Checking fmadm faulty and there is a pool without name

Node : sc02-app03.setaoffice.com
Node Type : Sun SPARC (HTTPS)
Severity : major
OM Server Time: 2018-02-06 11:11:01
Message : UXMON: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major
Msg Group : OS
Application : SOL_mon
Object : FMT
Event Type :
not_found

Instance Name :
not_found

Instruction : “The Fault Management agent has identified a HW or OS related problem with the severity presented by the ticket.
The problem(s) can be viewed and managed with the command – fmdump
To get a better understanding of the problem and on how to resolve it, locate the event that generated
the ticket in the syslog file /var/adm/messages, a URL will be found (http://sun.com/msg/xxx-nnnn-yy),
follow the link using your Oracle portal account for instructions.”
EventDataSource :

Checking fmadm faulty I see a pool without a name

root@sc02-app03:~# fmadm faulty
————— ———————————— ————– ———
TIME EVENT-ID MSG-ID SEVERITY
————— ———————————— ————– ———
Feb 06 15:59:15 4577aa7a-2b00-6eb6-f139-bf2848542fb2 ZFS-8000-HC Major

Problem Status : open
Diag Engine : zfs-diagnosis / 1.0
System
Manufacturer : unknown
Name : –
Part_Number : unknown
Serial_Number : unknown

System Component
Manufacturer : Oracle-Corporation
Name : ORCL,SPARC-T5-8
Part_Number : unknown
Serial_Number : unknown
Host_ID : (null)

—————————————-
Suspect 1 of 1 :
Problem class : fault.fs.zfs.io_failure_wait
Certainty : 100%
Affects : zfs://pool=9fbf9b5d11236d0a
Status : faulted but still in service

Resource
FMRI : “zfs://pool=9fbf9b5d11236d0a”
Status : faulted but still in service

Description : ZFS pool ” has experienced currently unrecoverable I/O failures.

Response : No automated response will occur.

Impact : Read and write I/Os cannot be serviced.

Action : Use ‘fmadm faulty’ to provide a more detailed view of this event.
Make sure the affected devices are connected, then run ‘zpool
clear’. Please refer to the associated reference document at
http://support.oracle.com/msg/ZFS-8000-HC for the latest service
procedures and policies regarding this diagnosis.

To solve this problem, you need to reboot the server

Advertisements

How to clear fmadm faulty entries in Solaris 10

Clear fmadm log

root@sc02-app04:~ # fmadm faulty
————— ———————————— ————– ———
TIME EVENT-ID MSG-ID SEVERITY
————— ———————————— ————– ———
Feb 06 10:17:04 0ad5260a-f9e0-ef7a-c91f-efcc76b9b164 ZFS-8000-HC Major

Host : sc02-app04
Platform : ORCL,SPARC-T5-8 Chassis_id :
Product_sn :

Fault class : fault.fs.zfs.io_failure_wait
Affects : zfs://pool=prd171
faulted but still in service
Problem in : zfs://pool=prd171
faulted but still in service

Description : The ZFS pool has experienced currently unrecoverable I/O
failures.

Response : No automated response will be taken.

Impact : Read and write I/Os cannot be serviced.

Action : Use ‘fmadm faulty’ to provide a more detailed view of this event.
Make sure the affected devices are connected, then run ‘zpool
clear’. Please refer to the associated reference document at
http://sun.com/msg/ZFS-8000-HC for the latest service procedures
and policies regarding this diagnosis.

————— ———————————— ————– ———
TIME EVENT-ID MSG-ID SEVERITY
————— ———————————— ————– ———
Feb 06 10:17:03 fe2302d7-99c9-c0c8-d54b-92495cc94fc9 ZFS-8000-D3 Major

Host : sc02-app04
Platform : ORCL,SPARC-T5-8 Chassis_id :
Product_sn :

Fault class : fault.fs.zfs.device
Affects : zfs://pool=prd171/vdev=5d2cdc446e947471
faulted and taken out of service
Problem in : zfs://pool=prd171/vdev=5d2cdc446e947471
faulted and taken out of service

Description : A ZFS device failed.

Response : No automated response will occur.

Impact : Fault tolerance of the pool may be compromised.

Action : Run ‘zpool status -x’ for more information. Please refer to the
associated reference document at http://sun.com/msg/ZFS-8000-D3
for the latest service procedures and policies regarding this
diagnosis.

root@sc02-app04:~ # fmadm repair 0ad5260a-f9e0-ef7a-c91f-efcc76b9b164
fmadm: recorded repair to 0ad5260a-f9e0-ef7a-c91f-efcc76b9b164
root@sc02-app04:~ # fmadm repair fe2302d7-99c9-c0c8-d54b-92495cc94fc9
fmadm: recorded repair to fe2302d7-99c9-c0c8-d54b-92495cc94fc9

Clear ereports and resource cache

root@sc02-app04:~ # cd /var/fm/fmd
root@sc02-app04:/var/fm/fmd # rm e* f* c*/eft/* r*/*

Clearing out FMA files with no reboot needed

root@sc02-app04:~ # svcadm disable -s svc:/system/fmd:default
root@sc02-app04:~ # cd /var/fm/fmd
root@sc02-app04:/var/fm/fmd # find /var/fm/fmd -type f -exec ls {} \;
/var/fm/fmd/topo/90ab82b5-08eb-6f9f-9a9a-af2975a2808b/hc-topology.xml
/var/fm/fmd/topo/6b4eba63-3576-e155-ac3b-8f6609f0b968/hc-topology.xml
/var/fm/fmd/topo/1badc01d-82b9-6203-9440-9dd440aedaca/hc-topology.xml
/var/fm/fmd/topo/f32b13d0-63a1-4b5a-e811-bfda6bddcba1/hc-topology.xml
/var/fm/fmd/topo/c4824832-ced3-672a-ec69-a9490f94d2c0/hc-topology.xml
/var/fm/fmd/ckpt/etm/etm
/var/fm/fmd/ckpt/zfs-diagnosis/zfs-diagnosis
root@sc02-app04:/var/fm/fmd # find /var/fm/fmd -type f -exec rm {} \;
root@sc02-app04:/var/fm/fmd # svcadm enable svc:/system/fmd:default

Checking fmadm faulty

root@sc02-app04:~ # fmadm faulty
root@sc02-app04:~ #

Reset the fmd serd modules

root@sc02-app04:~ # fmadm reset cpumem-diagnosis
fmadm: failed to reset module cpumem-diagnosis: specified module is not loaded in fault manager
root@sc02-app04:~ # fmadm reset cpumem-retire
fmadm: cpumem-retire module has been reset
root@sc02-app04:~ # fmadm reset eft
fmadm: eft module has been reset
root@sc02-app04:~ # fmadm reset io-retire
fmadm: io-retire module has been reset

Source: https://saifulaziz.com/2011/12/26/how-to-clear-fmadm-log-or-fma-faults-log/

UXMON: Service /application/pkg/zones-proxy-client status is maintenance, check with svcs -xv

ATTENTION, RMC LEVEL 1 AGENT: This ticket will be automatically worked by the Automation Bus. Pls. ensure your Ticket List/View includes the “Assignee” column, monitor this ticket until the user “ABOPERATOR” is no longer assigned, BEFORE you start work on this ticket.
Node : localzone.setaoffice.com
Node Type : Sun SPARC (HTTPS)
Severity : minor
OM Server Time: 2017-09-24 22:32:27
Message : UXMON: Service /application/pkg/zones-proxy-client status is maintenance, check with svcs -xv
Msg Group : OS
Application : svcsmon
Object : svcs
Event Type :
not_found

Instance Name :
not_found

Instruction : The svcsmon has detected solaris service status

Please, for details, browse the /var/opt/OV/log/OpC/svcs_mon.log
The configuration file uses to be /var/opt/OV/conf/OpC/svcs_mon.cfg

On global zone

root@globalzone:~ # pkg publisher
PUBLISHER TYPE STATUS P LOCATION
solaris origin online F file:///net/192.168.252.12/export/IPS-repos/solaris11/repo/
solaris origin online F file:///var/scmuidrs/idr2142.1.p5p/
solaris origin online F file:///var/scmuidrs/idr2160.2.p5p/
solaris origin online F file:///var/scmuidrs/idr2193.2.p5p/
solaris origin online F file:///var/scmuidrs/idr2194.1.p5p/
solaris origin online F file:///var/scmuidrs/idr2238.1.p5p/
exa-family origin online F file:///net/192.168.252.12/export/IPS-repos/exafamily/repo/
ha-cluster origin online F file:///net/192.168.252.12/export/IPS-repos/osc4/repo/

root@globalzone:~ # svcadm disable zones-proxyd system-repository; svcadm enable system-repository zones-proxyd; sleep 30
root@globalzone:~ #

On local zone

root@localzone:~# svcs -a | grep /application/pkg/zones-proxy-client
maintenance 8:08:15 svc:/application/pkg/zones-proxy-client:default

root@localzone:~# svcadm disable svc:/application/pkg/zones-proxy-client:default

root@localzone:~# svcadm enable svc:/application/pkg/zones-proxy-client:default

root@localzone:~# svcs -a | grep /application/pkg/zones-proxy-client
online 14:09:45 svc:/application/pkg/zones-proxy-client:default

Oracle SMF Oracle Configuration Manager (OCM) svc:/system/ocm:default

Listing status of ocm service

root@solaris:~ # svcs svc:/system/ocm:default
STATE STIME FMRI
disabled Jul_17 svc:/system/ocm:default

Listing SMF

root@solaris:~ # svcs -l svc:/system/ocm:default
fmri svc:/system/ocm:default
name Oracle Configuration Manager (OCM)
enabled false
state disabled
next_state none
state_time Mon Jul 17 04:48:06 2017
logfile /var/svc/log/system-ocm:default.log
restarter svc:/system/svc/restarter:default
contract_id
manifest /etc/svc/profile/generic.xml
manifest /lib/svc/manifest/system/ocm.xml
dependency require_all/none svc:/milestone/multi-user-server:default (online)
dependency require_all/error svc:/milestone/network:default (online)
dependency require_all/none svc:/system/cryptosvc (online)

Oracle Configuration Manager is used to collect client configuration information and upload it to the Oracle repository
When enabling OCM, it goes to maintenance mode

root@solaris:~ # svcadm enable svc:/system/ocm:default

root@solaris:~ # svcs -v svc:/system/ocm:default
STATE NSTATE STIME CTID FMRI
maintenance – 10:58:25 1081435 svc:/system/ocm:default

There are two errors in this installation. There is no proxy setup and it is missing user ocm

root@solaris:~ # cat /var/svc/log/system-ocm:default.log
[ Aug 28 10:58:09 Enabled. ]
[ Aug 28 10:58:09 Executing start method (“/lib/svc/method/svc-ocm start”). ]
/lib/svc/method/svc-ocm: starting…
OCM not registered
Collector running in connected mode
Begin anonymous registration…
Starting response file generation…
Can not create response file: Unknown Host: ccr.oracle.com: unknown error
Failed to create response file…
Failed to generate anonymous response file…
Unable to contact ccr.oracle.com. Please set your system proxy
in order to allow this system to contact Oracle for better
serviceability. See the configCCR(1M) manual page on home to set
the proxy server for Oracle Configuration Manager.

svc:/system/ocm:default has been temporarily disabled.

[ Aug 28 10:58:20 Method “start” exited with status 0. ]
[ Aug 28 10:58:20 Stopping because service disabled. ]
[ Aug 28 10:58:20 Executing stop method (“/lib/svc/method/svc-ocm stop”). ]
Stopping scheduler…
su: Unknown id: ocm
[ Aug 28 10:58:25 Method “stop” exited with status 95. ]

Solaris – UXMON: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major

UXMON: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major

Node : solaris.setaoffice.com
Node Type : Sun SPARC (HTTPS)
Severity : major
OM Server Time: 2017-08-12 10:27:31
Message : UXMON: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major
Msg Group : OS
Application : SOL_mon
Object : FMT
Event Type :
not_found

Instance Name :
not_found

Instruction : “The Fault Management agent has identified a HW or OS related problem with the severity presented by the ticket.
The problem(s) can be viewed and managed with the command – fmdump
To get a better understanding of the problem and on how to resolve it, locate the event that generated
the ticket in the syslog file /var/adm/messages, a URL will be found (http://sun.com/msg/xxx-nnnn-yy),
follow the link using your Oracle portal account for instructions.”

After running fmadm faulty, we see that there is a problem with a zpool. Run zpool status -x and then we see pool prd027_software is having problems

root@solaris:~ # zpool status prd027_software
pool: prd027_software
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
Run ‘zpool status -v’ to see device specific details.
see: http://support.oracle.com/msg/ZFS-8000-8A
scan: none requested
config:

NAME STATE READ WRITE CKSUM
prd027_software ONLINE 0 0 14.7K
c0t600507680191818C1000000000000BE9d0 ONLINE 0 0 0
c0t600507680191818C1000000000000BEAd0 ONLINE 0 0 0
c0t600507680191818C1000000000000BEBd0 ONLINE 0 0 0
c0t600507680191818C1000000000000BECd0 ONLINE 0 0 0

errors: 3 data errors, use ‘-v’ for a list

Run zpool scrub prd027_software

root@solaris:~ # zpool scrub prd027_software

root@solaris:~ # zpool status -xv
pool: prd027_software
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://support.oracle.com/msg/ZFS-8000-8A
scan: scrub in progress since Wed Dec 31 21:00:00 1969
50.7M scanned out of 1.08T at 25.3M/s, 12h25m to go
0 repaired, 0.00% done
config:

NAME STATE READ WRITE CKSUM
prd027_software ONLINE 0 0 14.7K
c0t600507680191818C1000000000000BE9d0 ONLINE 0 0 0
c0t600507680191818C1000000000000BEAd0 ONLINE 0 0 0
c0t600507680191818C1000000000000BEBd0 ONLINE 0 0 0
c0t600507680191818C1000000000000BECd0 ONLINE 0 0 0

errors: Permanent errors have been detected in the following files:

/zones/prd027/root/usr/software/best1/Patrol3/Solaris-2-10-sparc-64/best1/7.4.00/bgs/monitor/log/prd027-bgsagent_6767.als
prd027_software/software027:
prd027_software/software027:

After the pool is scanned, check if there is still a problem

root@solaris:~ # zpool status -xv
all pools are healthy

Repairing fmadm entries

root@solaris:~ # fmadm faulty|grep “Aug”
Aug 12 11:23:22 82fe93a5-8120-657b-9e61-e33252b84d30 ZFS-8000-D3 Major
Aug 12 11:22:01 74c61e33-7c56-4aca-d707-a32ce06a9bd8 ZFS-8000-CS Major

root@solaris:~ # fmadm repair 82fe93a5-8120-657b-9e61-e33252b84d30
fmadm: recorded repair to 82fe93a5-8120-657b-9e61-e33252b84d30

root@solaris:~ # fmadm repair 74c61e33-7c56-4aca-d707-a32ce06a9bd8
fmadm: recorded repair to 74c61e33-7c56-4aca-d707-a32ce06a9bd8

root@solaris:~ # fmadm faulty
root@solaris:~ #

You can’t disable SOL_mon.

These alerts are generated from global hardware policy not from any configuration files. Hence, there is no option to suppress from HPOM side.

Please enable suppression in Jet tool using free style format

Source Type Template Name
———– —————————————————————-
Logfile UXMON_sol_hw_syslog_PRE(1.2)

Message Text
————
UXMON: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major

Custom Message Attributes
————————-
EventSource MS_OVO
EventUniqueID UXMON-HW-000376
condition_name FMD events of Fault type

Solaris 10: passwd: password is based on a reversed dictionary word.

root@solaris:~ # passwd emerson
New Password:
passwd: password is based on a reversed dictionary word.

Please try again
New Password:

Edit file /etc/default/passwd and comment the files containing DICTIONDBDIR and DICTIONLIST

root@solaris:~ # vi /etc/default/passwd
#DICTIONDBDIR=/var/passwd
#DICTIONLIST=/usr/share/lib/dict/words

Checking serial number of Oracle SPARC T5-8

Connect to ILOM

emerson@linux:~ $ ssh root@172.23.99.70
Password:

Oracle(R) Integrated Lights Out Manager

Version 3.2.5.6.b r103360

Copyright (c) 2015, Oracle and/or its affiliates. All rights reserved.

Warning: HTTPS certificate is set to factory default.

Hostname: ssccn1-sp

-> show /System

/System
Targets:
Open_Problems (1)
CPU_Modules
Processors
Memory
Power
Cooling
Storage
Networking
PCI_Devices
Firmware
Log

Properties:
health = Service Required
health_details = PM0 (Processor Module 0) is faulty. Type ‘show /System/Open_Problems’ for details.
open_problems_count = 1
type = Rack Mount
model = SuperCluster T5-8
qpart_id = Q9527
part_number = SuperCluster T5-8
serial_number = AK00300268
component_model = SPARC T5-8
component_part_number = 7087535
component_serial_number = SP00386386
system_identifier = Oracle SuperCluster T5-8 SP00386386
system_fw_version = Sun System Firmware 9.5.1.b 2015/10/01 16:33
primary_operating_system = Oracle Solaris 11.3 SPARC
primary_operating_system_detail = –
host_primary_mac_address = 00:10:e0:76:92:de
ilom_address = 172.23.99.70
ilom_mac_address = 00:10:E0:76:92:E7
locator_indicator = Off
power_state = On
actual_power_consumption = 3958 watts
action = (none)

Commands:
cd
reset
set
show
start
stop

Another way is to locate the manufacturing sticker on the front of the server or on the sticker on the side of the server

Solaris 9 Branded Zone was not starting ftp when running kill -HUP

I have a Solaris 9 Branded Zone

root@solaris9:/ # uname -a
SunOS solaris9 5.9 Generic_Virtual sun4v sparc sun4v

Configured to run FTP

root@solaris9:/ # grep ftp /etc/inetd.conf
# ftp telnet shell login exec tftp finger printer
# TFTPD – tftp server (primarily used for booting)
#tftp dgram udp6 wait root /usr/sbin/in.tftpd in.tftpd -s /tftpboot
ftp stream tcp6 nowait root /usr/sbin/in.ftpd in.ftpd -l

But it was not working

root@solaris9:/ # ps -ef | grep ftp
root 10137 13230 0 13:31:28 pts/4 0:00 grep ftp

root@solaris9:/ # ps -ef | grep inet
root 12579 13230 0 13:31:34 pts/4 0:00 grep inet
root 1325 12833 0 Mar 12 ? 0:00 /usr/sbin/inetd -s start

Tried to kill -HUP but still not working

root@solaris9:/ # kill -HUP 1325

root@solaris9:/ # netstat -an | grep 21 | grep LISTEN
142.40.236.158.1521 *.* 0 0 1048576 0 LISTEN
142.40.236.10.1521 *.* 0 0 1048576 0 LISTEN

Stopped and started inetsvc

root@solaris9:/ # /etc/init.d/inetsvc stop
root@solaris9:/ # /etc/init.d/inetsvc start

root@solaris9:/ # ps -ef | grep inet
root 12098 12833 0 13:49:02 ? 0:00 /usr/sbin/inetd -s
root 15358 3734 0 13:49:05 pts/4 0:00 grep inet

FTP working again

root@solaris9:/ # netstat -an | grep 21 | grep LISTEN
142.40.236.158.1521 *.* 0 0 1048576 0 LISTEN
142.40.236.10.1521 *.* 0 0 1048576 0 LISTEN
*.21 *.* 0 0 1048576 0 LISTEN
*.21 *.* 0 0 1048576 0 LISTEN

UXMON: SSHD Daemon is not running or not doing it properly, please check

Node : solaris.setaoffice.com
Node Type : Sun SPARC (HTTPS)
Severity : normal
OM Server Time: 2016-09-10 08:03:10
Message : UXMON: SSHD Daemon is not running or not doing it properly, please check
Msg Group : OS
Application : sshd_mon
Object : sshd
Event Type :
not_found

Instance Name :
not_found

Instruction : It has been detected an SSH installation but the SSHD is not running
Please check SSH status, because it might happen also there are still some ssh spawned processes running but the father has died.

Note that if the SSH is not available this might prevent users log in the server and even impact some applications.

HPOM is complaining that ssh is not running but obviously is running because you’re connected to the server using ssh

root@solaris:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check sshdmon
Fri Sep 23 11:45:46 2016 : INFO : UXMONsshdmon is running now, pid=2250
Fri Sep 23 11:45:46 2016 : SSHDMON: SSHD – Not running
mv: /dev/null and /dev/null are identical
Fri Sep 23 11:45:46 2016 : INFO : UXMONsshdmon end, pid=2250

Check directory /var/run

root@solaris:/var/run # ls -la
total 16
drwxr-xr-x 4 root other 5 Sep 23 11:50 .
drwxr-xr-x 44 root sys 50 Aug 16 11:04 ..
-rw——- 1 root root 6 Jul 7 11:24 ds_agent.pid
drwxr-xr-x 13 root root 13 Aug 10 15:33 install_engine
drwx–x–x 2 root sys 2 Jul 6 14:27 sudo

It should have many files in /var/run

root@solaris:/var/run # ls -l
total 272
-rw——- 1 root root 0 Sep 10 21:27 AdDrEm.lck
drwxr-xr-x 3 root sys 183 Sep 10 21:43 cacao
-rw-rw-rw- 1 root bin 14 Sep 23 09:20 cdrom_rcm.conf
drwxr-xr-x 2 daemon daemon 183 Sep 23 12:18 daemon
-rw-r—– 1 root root 6 Sep 23 10:41 did_reloader.lock
-rw——- 1 root root 5 Sep 10 21:27 ds_agent.pid
Drw-r—– 1 root root 0 Sep 10 21:28 event_listener_proxy_door
Drw-r–r– 1 root root 0 Sep 10 21:40 fed_doorglobal
Drw-r–r– 1 root root 0 Sep 10 21:27 hotplugd_door
Drw-r–r– 1 root root 0 Sep 10 21:28 ifconfig_proxy_doorglobal
-rw——- 1 root root 0 Sep 10 21:26 ipsecconf.lock
Dr–r–r– 1 daemon daemon 0 Sep 10 21:26 kcfd_door
-rw——- 1 root root 0 Sep 14 09:07 lockf_raidctl
Dr–r–r– 1 root root 0 Sep 10 21:26 name_service_door
-rw-r–r– 1 root root 8 Sep 10 21:40 nfs4_domain
drwxr-xr-x 2 root root 179 Sep 10 21:40 pcmcia
Dr–r–r– 1 root root 0 Sep 10 21:26 picld_door
Drw-r–r– 1 root root 0 Sep 10 21:30 pmfd_doorglobal
-rw-r–r– 1 root sys 58 Sep 10 21:30 psn
Dr——– 1 root root 0 Sep 10 21:26 rcm_daemon_door
-rw-r–r– 1 root root 0 Sep 10 21:26 rcm_daemon_lock
-rw——- 1 root root 1068 Sep 10 21:26 rcm_daemon_state
Drw-r–r– 1 root root 0 Sep 10 21:40 rgmd_receptionist_doorglobal
drwxrwxrwt 2 root root 186 Sep 10 21:27 rpc_door
drwx—— 2 root root 182 Sep 10 21:27 smc898
-rw-r–r– 1 root root 5 Sep 10 21:27 sshd.pid
drwx–x–x 3 root sys 176 Sep 10 21:31 sudo
drwxr-xr-x 3 root root 191 Sep 10 21:26 sysevent_channels
Drw-r–r– 1 root root 0 Sep 10 21:30 sysevent_proxy_doorglobal
-rw-r–r– 1 root root 5 Sep 10 21:27 syslog.pid
Drw-r–r– 1 root root 0 Sep 10 21:27 syslog_door
-rw-r–r– 1 root root 8192 Sep 10 21:26 tzsync
drwx—— 2 root root 2625 Sep 23 10:26 zones
Drw-r–r– 1 root root 0 Sep 10 21:30 zoneup_doorglobal

Fixing the issue for the ticket. Check ssh processes for root user

root@solaris:/var/run # ps -ef | grep ssh | grep root
root 8047 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 17380 13924 0 00:07:02 ? 0:00 /usr/lib/ssh/sshd
root 5570 13878 0 08:08:40 ? 0:00 /usr/lib/ssh/sshd
root 13877 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 1003 13878 0 09:17:01 ? 0:00 /usr/lib/ssh/sshd
root 13903 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 60966 13918 0 00:03:07 ? 0:00 /usr/lib/ssh/sshd
root 48654 13878 0 10:13:22 ? 0:00 /usr/lib/ssh/sshd
root 13918 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 17389 13924 0 00:07:02 ? 0:00 /usr/lib/ssh/sshd
root 39554 13878 0 09:21:51 ? 0:00 /usr/lib/ssh/sshd
root 64681 1 0 11:25:02 ? 0:00 /usr/lib/ssh/sshd
root 11912 13878 0 09:29:14 ? 0:00 /usr/lib/ssh/sshd
root 56172 13878 0 11:54:55 ? 0:00 /usr/lib/ssh/sshd
root 17386 13924 0 00:07:02 ? 0:00 /usr/lib/ssh/sshd
root 34708 13878 0 08:51:07 ? 0:00 /usr/lib/ssh/sshd
root 60201 13878 0 09:27:36 ? 0:00 /usr/lib/ssh/sshd
root 55272 1 0 11:54:33 ? 0:00 /usr/lib/ssh/sshd
root 5850 13878 0 08:08:47 ? 0:00 /usr/lib/ssh/sshd
root 9865 44290 0 11:56:17 pts/4 0:00 grep ssh
root 13924 1 0 Sep 21 ? 0:00 /usr/lib/ssh/sshd
root 13878 1 0 Sep 21 ? 0:01 /usr/lib/ssh/sshd

Creating file /var/run/sshd.pid with sshd PID

echo 8047 > /var/run/sshd.pid

root@solaris:/var/run # ls -l sshd.pid
-rw-r–r– 1 root root 5 Sep 10 21:27 sshd.pid

sshdmon does not complain anymore

root@solaris:~ # /var/opt/OV/bin/instrumentation/UXMONbroker -check sshdmon
Fri Sep 23 11:58:15 2016 : INFO : UXMONsshdmon is running now, pid=18095
mv: /dev/null and /dev/null are identical
Fri Sep 23 11:58:15 2016 : INFO : UXMONsshdmon end, pid=18095

Solaris Volume Manager – Delete replicas of the metadevice state database

In this Solaris server, one of the disks needs replacement

root@solaris # echo | format
Searching for disks…done

AVAILABLE DISK SELECTIONS:
0. c0t2d0 <drive not available>
/pci@1f,0/pci@1,1/scsi@2/sd@2,0
1. c0t3d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
/pci@1f,0/pci@1,1/scsi@2/sd@3,0
Specify disk (enter its number): Specify disk (enter its number):

Checking replicas of the metadevice state database

root@solaris # metadb
flags first blk block count
M p 16 unknown /dev/dsk/c0t2d0s4
M p 8208 unknown /dev/dsk/c0t2d0s4
M p 16400 unknown /dev/dsk/c0t2d0s4
M p 16 unknown /dev/dsk/c0t2d0s5
M p 8208 unknown /dev/dsk/c0t2d0s5
M p 16400 unknown /dev/dsk/c0t2d0s5
a m p lu 16 8192 /dev/dsk/c0t3d0s4
a p l 8208 8192 /dev/dsk/c0t3d0s4
a p l 16400 8192 /dev/dsk/c0t3d0s4
a p l 16 8192 /dev/dsk/c0t3d0s5
a p l 8208 8192 /dev/dsk/c0t3d0s5
a p l 16400 8192 /dev/dsk/c0t3d0s5

Deleting metadevice state database on the slices of the bad disk. First slice 4

root@solaris # metadb -d /dev/dsk/c0t2d0s4
metadb: solaris: Bad address

root@solaris # metadb
flags first blk block count
M p 16 unknown /dev/dsk/c0t2d0s5
M p 8208 unknown /dev/dsk/c0t2d0s5
M p 16400 unknown /dev/dsk/c0t2d0s5
a m p lu 16 8192 /dev/dsk/c0t3d0s4
a p l 8208 8192 /dev/dsk/c0t3d0s4
a p l 16400 8192 /dev/dsk/c0t3d0s4
a p l 16 8192 /dev/dsk/c0t3d0s5
a p l 8208 8192 /dev/dsk/c0t3d0s5
a p l 16400 8192 /dev/dsk/c0t3d0s5

And then in slice 5

root@solaris # metadb -d /dev/dsk/c0t2d0s5
metadb: solaris: Bad address

root@solaris # metadb
flags first blk block count
a m p lu 16 8192 /dev/dsk/c0t3d0s4
a p l 8208 8192 /dev/dsk/c0t3d0s4
a p l 16400 8192 /dev/dsk/c0t3d0s4
a p l 16 8192 /dev/dsk/c0t3d0s5
a p l 8208 8192 /dev/dsk/c0t3d0s5
a p l 16400 8192 /dev/dsk/c0t3d0s5

%d bloggers like this: