Saturday, January 15, 2011

How to add a node to 11gR2 Grid Infrastructure (the right and wrong way)

How to add a node to 11gR2 Grid Infrastructure (the right and wrong way)

Last week, I was fortunate enough to attempt adding a couple of nodes to my existing 11gR2 RAC. I had read over several documents (see below for reference) to prepare but, having never done this before, when is one really prepared for it??

I installed the OS and relevant PowerPath software for connectivity to the EMC Storage. Little did I know that things were going to be a lot tougher than I had initially thought?

The documentation was straightforward enough, basically: prepare the new nodes (install rpms, asmlib, kernel parameters, users, equivalency, directories etc) but, I came across one issue after another J

9 PM on a Tuesday:

The existing clusters’ kernel was a few versions behind current and when the new servers were updated (through up2date) their kernel versions were upgraded as well. So, I went ahead and update the existing servers and at the same time accepted a software update through the RHEL OS (outside of up2date). Well, it just so happens that PowerPath will not work with this latest version of the kernel (lesson 1), which I found out after 3 hours over chat with them at 2 AM at night. So, I downgraded the kernel and presto! The PowerPath software decides to work.  

Then there was the step for actually running the script to extend the RAC (only for the Grid Infrastructure) onto the new nodes. I had created the target $GRID_HOME with the exact same permissions as the existing nodes where $GRID_HOME is owned by “root”. Well, unfortunate for me, the addNode.sh script is run as the “grid” user which will never be able to copy any files into the $GRID_HOME because of permission issues (lesson 2). Sadly, the installer is not friendly enough to point that out – neither in the beginning nor anywhere during the addNode.sh process. After an hour of waiting – and debugging the running process on the target node, I “tried” by changing the owner and that seemed to work! I thought it was simply amazing how easy Oracle has made it to extend the CRS Stack onto a new node.

6 AM Wednesday:

At this point I’m sick with the flu, sleepy and extremely tired. Having successfully adding the two new nodes to the cluster and verifying with “crsctl stat res –t” that all nodeapps were indeed running, I was pleased enough with myself to fall asleep. A few hours later – and not much luck with sleep – I’m back on and this point extending the database software. This step is also very straightforward and runs without much hassle (and within a few minutes).

Now, at this point I have shiny new nodes but no database instances on them! I had decided to make use of the new hardware by using one of the new nodes for the existing database and the same (older) node for a new databases instance.

So, from
RACDB1 -> RAC1, RAC2 to
RACDB1-> RAC1, RAC3 and
RACDB2-> RAC2, RAC4

But, I left it at
RACDB1-> RAC1, RAC2 and
RACDB2-> RAC2, RAC4

10 PM on Wednesday:
I was able to create the new database on RAC2 and RAC4 as well as import the data for it which took the rest of the day (around 10 PM). Before going to sleep, I wanted to at least finish up with RACDB1 so, I started dbca and moved the instances around as described above. This task had its own issues because java was being a real pain! So, I decided to manually create the second instance of RACDB1 on RAC3 – this worked and I had successfully moved the instances.

Sometime during this process, RAC1’s CRS Daemon crashed and would not restart. I tried a deconfigure and re-ran root.sh on RAC1 but it was of no use. After checking the log files (see below of locations) I found there were core dumps.

1 AM on Thursday:

Time to call in the cavalry – created an SR with Oracle. Got a call from a rather inexperienced fellow in India who didn’t seem to understand until 2 hours into my call that CRS wasn’t running – I had a very bad feeling about this. I gave him all the required logs and even pointed out where the crsd.log was throwing core dumps but he wasn’t particularly interested. I specifically told him to call me when he needs further information but all he did was update the ticket two hours later (needless to say I was fast asleep and didn’t know the better). Fortunately, my colleague had picked up the ticket and sent the relevant information over.

12 PM on Thursday:

When I woke, I took over the ticket and received steps from the new engineer who, basically wanted me to remove the two nodes from the cluster and re-register them. I wasn’t doing it until an engineer was on a web conference with me. Conveniently, his shift was almost over, so I was transferred to the next engineer who didn’t contact me back until a few hours later.

She was very pleasant and helpful. We went ahead and completely removed the two nodes (RAC1 and RAC3) from the cluster and re-added them. Remember how RAC1 was having issues with CRS? Well, now RAC3 decided to do the same, except it was a different error. Don’t worry, I’ve went through all these steps below. So, we spend another 6 hours debugging RAC3 (which wasn’t even the initially problematic node!!) she says her shift is done and she will transfer the SR to the next engineer – oh I could have shot someone by this point.

8 PM on Thursday:

Did I mention it was my anniversary the next day? And I still hadn’t bought a gift for my wife…sigh.

After that call, I decided to de-configure and re-run root.sh on RAC3 which fixed it! The CRS Stack was running, I started the instance and reconfigured the services. There were still issues with registering RAC3 with the cluster: vip wasn’t running, node needed to pinned, nodeapps weren’t running!

Around 8:40 PM on Thursday, I received a call from the fourth engineer (from Australia) who seemed to know what he was talking about. He was able fix all the immediate issues on RAC3 with the nodeapps, instance startup issues etc.

10:30 PM on Thursday:

At around 10:30 PM I told him since the one node is running, we can pick it up on Monday but I still went ahead and sent him the core dumps and alert logs for the CRS Daemon. Before falling asleep at around 1 AM, I noticed an update to the ticket where he had asked to check the “grid” and “oracle” user permissions against the working nodes. Sure enough, they were different and as soon as I fixed them, RAC1 was back in the cluster.

I was then able to reconfigure all the services for RACDB1, and sleep peacefully after 2 days… the end!


So, if you’re still interested – after that lengthy story – in knowing how and what I went through to add the two nodes, here are the steps:


Current Setup:

OS:                                         RHEL 5.5 (Tikanga) 64 bit
Kernel:                                   2.6.18-194.26.1.0.1.el5
SAN:                                       EMC Clarion CX4 240
Nodes:                                    RAC1, RAC2
Grid Version:                         Grid Infrastructure 11gR2 (11.2.0.2)
Database Version:                10gR2 (10.2.0.4.3)

Addition:                                RAC3, RAC4
SAN:                                      Additional Tray


Make sure that:

·         Kernel versions MUST match between all nodes.

·         Check the interconnect speed between the two nodes. For me, anything between 30 to 40 Mbps is acceptable on a 4G fiber NIC.

·         Install all Oracle Recommended RPMs.

·         EMCPowerPath and Navisphere Agents must be installed. Confirm with EMC whether the kernel version is supported or not!  As of January 2011, kernel-2.6.32-100.0.19.el5 is not supported by the EMC PowerPath software.

·         All /dev/emcpower* devices MUST be in the correct order among all nodes sharing the same ASM instance.

·         Copy /etc/udev/rules.d/63-oracle-raw.rules and /etc/udev/rules.d/99-raw.rules to the new node(s).

This will map the /dev/emcpower* devices to /dev/raw/raw* devices. We are not using ASMLib as a pass-through for disk management, rather mapping ASM directly to the raw devices.

To be on the safe side, import the settings from one of the existing nodes to the new node(s):
Export:         [root@RAC3   ~]# emcpadm export_mappings -f lunmap.txt
Import:         [root@RAC3   ~]# emcpadm import_mappings -v -f lunmap.txt
                Verify:          [root@RAC3   ~]# powermt display dev=all (from existing node to new)

Check whether the /raw/ devices are usable by the “grid” user. (Read and Write).
[grid@RAC3   ~]# dd if=/dev/zero of=/dev/raw/raw34 bs=1024 count=100
[grid@RAC3   ~]# dd if=/dev/raw/raw34 of=/tmp/raw.input bs=1024 count=100

·         Install ASMLib Drivers for the current Kernel. Copy /etc/sysconfig/oracleasm-_dev_oracleasm from existing nodes to the new ones.


·         Create “grid” and “oracle” users. This extract is from /etc/group
oinstall:x:1000:grid,oracle
asmadmin:x:1200:grid,oracle
asmdba:x:1201:grid,oracle
asmoper:x:1202:grid,oracle
dba:x:1300:oracle

·         Create  the “grid” and “oracle” homes (copy existing permissions). For $GRID_HOME make sure it is owned by the “grid” user and not “root”!!!!! It is changed to root after the node is added to the cluster and the addNode.sh script will be stuck trying to copy files to the new node – it connects as the “grid” user.

·         Mimic the profile settings for both users from existing nodes (Change the ORACLE_SID values).

·         Set up equivalency between the nodes for both users.
mkdir ~/.ssh
chmod 700 ~/.ssh
Go to existing node and run
[grid@RAC3  .ssh]$ scp authorized_keys newnode:/home/grid/.ssh
On new node run
[grid@RAC3  ~]$ cd /home/grid/.ssh
[grid@RAC3  .ssh]$ cat id_dsa.pub >> authorized_keys
[grid@RAC3  .ssh]$ scp authorized_keys RAC1:/home/grid/.ssh
[grid@RAC3  .ssh]$ scp authorized_keys node2:/home/grid/.ssh

To extend the cluster for Grid Infrastruction (ASM and Clusterware only):

Run addNode script. If not using SCAN, then only the following.

[grid@RAC3  ~]$ $GRID_HOME/oui/bin/addNode.sh -silent "CLUSTER_NEW_NODES={RAC3}" "CLUSTER_NEW_PRIVATE_NODE_NAMES={RAC3-priv}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={RAC3-vip}"

Always run one node at a time and it should NOT take more than 30 minutes to an hour to complete. If it does, then check permissions on $GRID_HOME for target node(s).

When the copy is complete run the two scripts as root. The last script will start the CRS Stack. Should there be any issues while starting, check the following log locations:

$GRID_HOME/log/<nodename>/* (Specifically alert<nodename>.log and crsd/crsd.log and crsdOUT.log). They are invaluable sources of information.

Some times ASM will not mount all disks so, connect to the ASM instance and check the disk status:

SELECT name, status FROM v$asm_diskgroup;

If they are DISMOUNTED, then simply mount then and try to restart the CRS Stack. You may need to stop the stack with the ‘–f’ option.
root@newnode# $GRID_HOME/bin/crsctl stop crs –f
root@newnode# $GRID_HOME/bin/crsctl start crs

Wait a few minutes (patience is a golden virtue!!!!) and check CRS.
root@newnode# $GRID_HOME/bin/crsctl check crs
If it still fails, then try to deconfigure and re-run root.sh

root@newnode# $GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force
root@newnode# $GRID_HOME/root.sh

It usually fixes any kinks, but if that doesn’t work then try to reboot the node and make sure the ASM Disks are mounted.

If a node is throwing coreXXXX.dmp files in the $GRID_HOME/log/<nodename>/crsd/ directory then check the permissions!! I spent two days debugging this issue when the Oracle Engineer pointed it out to me.

Remove GRID_HOME and re-ading the node in the cluster (yay!)

To debug the issue, I had to completly remove all traces of the $GRID_HOME from RAC1 and RAC3  because:
a) coreXXXX.dmp (on RAC1)
b) eliminate and possibility that either of the two nodes weren’t corrupted.

Run this on both nodes first:
root@RAC1/3# $GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force
In order to do a complete cleanup, the oraInventory/ContentsXML/inventory.XML had to be updated to remove RAC1 and RAC3 . The following script does the trick (run from Node2 and Node4 ) but make sure that the oraInventory directory isn’t locked.

root@node2# $GRID_HOME/oui/runInstaller -updateNodeList "CLUSTER_NODES={gcpintrac02,gcpintrac04}" ORACLE_HOME="/usr/app/ora/product/11.2.0/grid" -defaultHomeName LOCAL_NODE="gcpintrac02"

Ensure there are no references of RAC1 and RAC3  in the existing cluster

$GRID_HOME/bin/crsctl delete node -n RAC1
$GRID_HOME/bin/crsctl delete node -n RAC3
$GRID_HOME/bin/crsctl stat res -t | grep RAC1
$GRID_HOME/bin/crsctl stat res -t | grep RAC3

Delete $GRID_HOME on both nodes. It helped me even more because I cleaned up most of the unnecessary log files.

Fix the permissions on the $GRID_HOME and make “grid” as the owner, otherwise you’ll sit there for eternity for the addNode script to finish. Root.sh will re-apply the permissions to make “root” the owner.

Run ./addnode as mentioned above and then the two scripts afterward (it might just ask you to run only one).

Then what happened??

After root.sh was run on RAC3, CRS Stack still wouldn’t start and after 4 hours of back and forth with Oracle Support I decided to deconfigure.

Crsd.log was showing this error repeatidly:

2010-12-30 18:20:32.508: [  CRSOCR][1925030480] OCR context init failure.  Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=8, opn=kgfoIO02, dep=27091, loc=kgfokge
ORA-27091: unable to queue I/O
ORA-15081: failed to submit an I/O operation to a disk
ORA-06512: at line 4
] [3]
2010-12-30 18:20:32.508: [    CRSD][1925030480][PANIC] CRSD exiting: Could not init OCR, code: 26
2010-12-30 18:20:32.508: [    CRSD][1925030480] Done.

I ran the “deconfig and root.sh” option after checking disk permissions /dev/raw/raw* and /dev/emcpower* against the existing working nodes (which was the same) and presto RAC3  worked!

RAC1 was the original problem since it kept reporting the codeXXXX.dmp errors in the crsd log directory. I fortunately came across a talented and sharp Oracle Engineer who was able to go through the core dump files and identify that the permissions were not set correctly on RAC1. Since RAC3  was working at this point, simply mimicked its permissions for the “grid” and “oracle users. If this happens to you then go through /etc/groups to make sure the permissions match up exactly! After that was node, RAC1’s CRS Stack started and I breathed easy again.



To add the Oracle software to the new node run:

$ORACLE_HOME/oui/bin/addNode.sh
This is a gui tool which can be used to install (copy) the software on the new nodes.



Links I found to be extremely useful during this adventure.


continue reading "How to add a node to 11gR2 Grid Infrastructure (the right and wrong way)"