Category: Oracle RAC

Adding new disks to Oracle ASM (Automatic Storage Management)

I have a two node Oracle RAC and I need to add disks to it

Storage team presented 3 new LUNs

rac1 & rac2
RAC1RAC2_PROJECT335066_Data1 (Tier1, RAID5, Size: 50GB)
World Wide LUN ID# 6001-4380-05de-d87b-0000-5000-10ef-0000
RAC1RAC2_PROJECT335066_Logs1 (Tier1, RAID1, Size: 20GB)
World Wide LUN ID# 6001-4380-05de-d87b-0000-5000-10f3-0000
RAC1RAC2_PROJECT335066_Quo (Tier1, RAID5, Size: 1GB)
World Wide LUN ID# 6001-4380-05de-d87b-0000-5000-10f7-0000

10ef
10f3
10f7

My server has a fibre-channel HBA card. Check HBA information. Install libsysfs and sysfsutils if you don’t have systool. Run yum install sysfsutils. I want to use an internal repository so I won’t use yum

root@rac1# systool -av -c fc_host
-bash: systool: command not found

root@rac1:~ # rpm -ivh http://172.22.19.185/rhel/redhat/rhel-x86_64-server-6/getPackage/libsysfs-2.1.0-7.el6.x86_64.rpm http://172.22.19.185/rhel/redhat/rhel-x86_64-server-6/getPackage/sysfsutils-2.1.0-7.el6.x86_64.rpm
Retrieving http://172.22.19.185/rhel/redhat/rhel-x86_64-server-6/getPackage/libsysfs-2.1.0-7.el6.x86_64.rpm
Retrieving http://172.22.19.185/rhel/redhat/rhel-x86_64-server-6/getPackage/sysfsutils-2.1.0-7.el6.x86_64.rpm
Preparing… ########################################### [100%]
1:libsysfs ########################################### [ 50%]
2:sysfsutils ########################################### [100%]

root@rac1:~ # systool -av -c fc_host | grep “Class Device =” | awk -F’=’ {‘print $2’} | awk -F'”‘ {‘print “echo \”- – -\” > /sys/class/scsi_host/”$2″/scan”‘}
echo “- – -” > /sys/class/scsi_host/host0/scan
echo “- – -” > /sys/class/scsi_host/host1/scan

root@rac1:~ # systool -av -c fc_host | grep “Class Device =” | awk -F’=’ {‘print $2’} | awk -F'”‘ {‘print “echo \”- – -\” > /sys/class/scsi_host/”$2″/scan”‘} | bash

Listing the disks under Oracle ASM

root@rac1:~ # /etc/init.d/oracleasm listdisks
OCR_VOTE_001
OCR_VOTE_002
OCR_VOTE_003
OCR_VOTE_004
OCR_VOTE_005
ORAARCH_001
ORADATA_001

Added the disk information to /etc/multipath.conf

multipath {
wwid 36001438005ded87b0000500010ef0000
alias asmdisk07
}
multipath {
wwid 36001438005ded87b0000500010f30000
alias asmdisk08
}
multipath {
wwid 36001438005ded87b0000500010f70000
alias votdisk04
}

Then run multipath -r to reload the new multipath aliases

Check if the aliases changed

root@rac1:~ # multipath -ll | grep 10ef
asmdisk07 (36001438005ded87b0000500010ef0000) dm-44 HP,HSV450
root@rac1:~ # multipath -ll | grep 10f3
asmdisk08 (36001438005ded87b0000500010f30000) dm-45 HP,HSV450
root@rac1:~ # multipath -ll | grep 10f7
votdisk04 (36001438005ded87b0000500010f70000) dm-46 HP,HSV450

DBA team asked to change owner and group for the disk devices

root@rac1:~ # chown oracle:dba /dev/mapper/asmdisk07
root@rac1:~ # chown oracle:dba /dev/mapper/asmdisk08
root@rac1:~ # chown oracle:dba /dev/mapper/votdisk04

Labeling disks

root@rac1:~ # /etc/init.d/oracleasm createdisk OCR_VOTE_006 votdisk04
Marking disk “OCR_VOTE_006” as an ASM disk: [ OK ]
root@rac1:~ # /etc/init.d/oracleasm createdisk ORADATA_002 asmdisk07
Marking disk “ORADATA_002” as an ASM disk: [ OK ]
root@rac1:~ # /etc/init.d/oracleasm createdisk ORADATA_003 asmdisk08
Marking disk “ORADATA_003” as an ASM disk: [ OK ]

Checking if the new disks

root@rac1:~ # /etc/init.d/oracleasm listdisks
OCR_VOTE_001
OCR_VOTE_002
OCR_VOTE_003
OCR_VOTE_004
OCR_VOTE_005
ORAARCH_001
ORADATA_001
ORADATA_002
ORADATA_003

In the other node, the disks are not updated automatically

root@rac2:~ # /etc/init.d/oracleasm listdisks
OCR_VOTE_001
OCR_VOTE_002
OCR_VOTE_003
OCR_VOTE_004
OCR_VOTE_005
ORAARCH_001
ORADATA_001

You need to perform the same steps and instead of running oracleasm disk, run oracleasm scandisks

root@rac2:~ # /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks: [ OK ]

root@rac2:~ # /etc/init.d/oracleasm listdisks
OCR_VOTE_001
OCR_VOTE_002
OCR_VOTE_003
OCR_VOTE_004
OCR_VOTE_005
ORAARCH_001
ORADATA_001
ORADATA_002
ORADATA_003

Check which nodes an Oracle RAC is running

To check which nodes Oracle RAC is running, run the command below:

root@linuxrac1:~ # cluvfy stage -post hwos -n all
Performing post-checks for hardware and operating system setup

Checking node reachability…
Node reachability check passed from node “linuxrac1”.

Checking user equivalence…
User equivalence check passed for user “oracle”.

Checking node connectivity…

Node connectivity check passed for subnet “142.40.236.0” with node(s) linuxrac1,linuxrac2.
Node connectivity check passed for subnet “192.168.254.8” with node(s) linuxrac1,linuxrac2.
Node connectivity check passed for subnet “172.22.16.0” with node(s) linuxrac1,linuxrac2.

Suitable interfaces for VIP on subnet “142.40.236.0”:
linuxrac1 vlan1100:142.40.238.172
linuxrac2 vlan1100:142.40.238.173

Suitable interfaces for VIP on subnet “142.40.236.0”:
linuxrac1 vlan1100:142.40.236.175
linuxrac2 vlan1100:142.40.236.177

Suitable interfaces for the private interconnect on subnet “192.168.254.8”:
linuxrac1 vlan1102:192.168.254.9
linuxrac2 vlan1102:192.168.254.10

Suitable interfaces for the private interconnect on subnet “172.22.16.0”:
linuxrac1 vlan1502:172.22.16.218 vlan1502:172.22.16.218
linuxrac2 vlan1502:172.22.16.219 vlan1502:172.22.16.219

Node connectivity check passed.

Checking shared storage accessibility…

WARNING:
Package cvuqdisk not installed.
linuxrac1,linuxrac2

No shared storage found.

Shared storage check was successful on nodes “linuxrac1,linuxrac2”.

Post-check for hardware and operating system setup was successful.

A Cluster with Solaris and Oracle RAC – Avoid panic when rebooting

You just rebooted a server that you know that runs Oracle RAC and the other node rebooted. This happened because CRS was active and you first need to disable to reboot the other node. You’ll see this message on /var/adm/messages

Jun 5 14:15:19 solaris_rac2 root: Oracle clsomon failed with fatal status 12.
Jun 5 14:15:20 solaris_rac2 root: Oracle CRS failure. Rebooting for cluster integrity.
rebooting…

Run this command to stop CRS. Your path may vary according to where you installed the binary.

root@solaris_rac1:~ # /u01/app/oracle/product/11.1.0/bin/crsctl stop crs
Stopping resources.
This could take several minutes.
Successfully stopped Oracle Clusterware resources
Stopping Cluster Synchronization Services.
Shutting down the Cluster Synchronization Services daemon.
Shutdown request successfully issued.

Check the status with crsctl check crs. It should display the message below

root@solaris_rac1:~ # /u01/app/oracle/product/11.1.0/bin/crsctl check crs
Failure 1 contacting Cluster Synchronization Services daemon
Cannot communicate with Cluster Ready Services
Cannot communicate with Event Manager

After you reestablish the machine, CRS should start automatically

root@solaris_rac1:~ # /u01/app/oracle/product/11.1.0/bin/crsctl check crs
Cluster Synchronization Services appears healthy
Cluster Ready Services appears healthy
Event Manager appears healthy

Problems mounting an OCFS2 filesystem under Linux – Transport endpoint is not connected

I had a problem mounting an ocfs2 filesystem under Oracle Cluster Management Software.

root@oracm1:~# mount -a
mount.ocfs2: Transport endpoint is not connected while mounting /dev/mapper/oravg01-lvu03 on /u03. Check ‘dmesg’ for more information on this error.

The error message said to check dmesg for more information and there it was. The cluster was not handshaking with the other node because the network timeout was different.

root@oracm1:~# dmesg
Buffer I/O error on device sdab, logical block 262143
(8290,0):o2net_check_handshake:1180 node oracm2 (num 1) at 192.168.2.101:7777 uses a network idle timeout of 10000 ms, but we use 30000 ms locally.  disconnecting
(8211,0):dlm_request_join:901 ERROR: status = -107
(8211,0):dlm_try_to_join_domain:1049 ERROR: status = -107
(8211,0):dlm_join_domain:1321 ERROR: status = -107
(8211,0):dlm_register_domain:1514 ERROR: status = -107
(8211,0):ocfs2_dlm_init:2024 ERROR: status = -107
(8211,0):ocfs2_mount_volume:1133 ERROR: status = -107
ocfs2: Unmounting device (253,7) on (node 0)

I entered the other cluster node to check the cluster status. The heartbeat is active and the timeout is 10000.

root@oracm2:~# service o2cb status
Module “configfs”: Loaded
Filesystem “configfs”: Mounted
Module “ocfs2_nodemanager”: Loaded
Module “ocfs2_dlm”: Loaded
Module “ocfs2_dlmfs”: Loaded
Filesystem “ocfs2_dlmfs”: Mounted
Checking O2CB cluster ocfs2: Online
Heartbeat dead threshold: 61
Network idle timeout: 10000
Network keepalive delay: 5000
Network reconnect delay: 2000
Checking O2CB heartbeat: Active

root@oracm1:~# service o2cb status
Module “configfs”: Loaded
Filesystem “configfs”: Mounted
Module “ocfs2_nodemanager”: Loaded
Module “ocfs2_dlm”: Loaded
Module “ocfs2_dlmfs”: Loaded
Filesystem “ocfs2_dlmfs”: Mounted
Checking O2CB cluster ocfs2: Online
Heartbeat dead threshold: 61
Network idle timeout: 30000
Network keepalive delay: 5000
Network reconnect delay: 2000
Checking O2CB heartbeat: Not active

The host oracm1 needs to have exactly the same configuration as oracm2, so I invoked the o2cb script with the configure parameter to reconfigure the timeout.

root@oracm1:~# service o2cb configure
Configuring the O2CB driver.

This will configure the on-boot properties of the O2CB driver.
The following questions will determine whether the driver is loaded on
boot.  The current values will be shown in brackets (‘[]’).  Hitting
<ENTER> without typing an answer will keep that current value.  Ctrl-C
will abort.

Load O2CB driver on boot (y/n) [y]: y
Cluster to start on boot (Enter “none” to clear) [ocfs2]: ocfs2
Specify heartbeat dead threshold (>=7) [61]: 61
Specify network idle timeout in ms (>=5000) [30000]: 10000
Specify network keepalive delay in ms (>=1000) [5000]: 5000
Specify network reconnect delay in ms (>=2000) [2000]: 2000
Writing O2CB configuration: OK
O2CB cluster ocfs2 already online

I restarted the o2cb service

root@oracm1:~# service o2cb stop
root@oracm1:~# service o2cb start

And checked that the node entered the cluster.

root@oracm1:$ORACLE_HOME/oracm/log # grep “Successful reconfiguration” cm.log
Successful reconfiguration,  2 active node(s) node 1 is the master, my node num is 0 (reconfig 22) {Thu Jun 11 13:17:32 2009 }

root@oracm1:~# df -h /u03
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/oravg01-lvu03
30G   29G  1.1G  97% /u03
root@oracm1:~# mount | grep u03
/dev/mapper/oravg01-lvu03 on /u03 type ocfs2 (rw,_netdev,datavolume,nointr,heartbeat=local)