Upgrade GI/CRS 11.1.0.7 to 11.2.0.2. Rootupgrade.sh Hanging

Upgrade grid 11.1.0.7 to 11.2.0.2. Rootupgrade.sh Hanging We installed 11gR2 GI software and applied PSU2 patches upon getting runupgrade.sh prompt.runupgrade.sh hang on the first node. [root@vrh8 client]# uname -a Linux vrh8 2.6.18-238.5.1.el5 #1 SMP Mon Feb 21 05:52:39 EST 2011 x86_64 x86_64 x86_64 GNU/Linux cluvfy passed with 2 ignorable errors: [root@vrh8 vrh8]# cd /tmp [root@vrh8 tmp]# df -lh . Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg0-tmp 992M 263M 679M 28% /tmp [root@vrh8 grid]# grep fail cluvfy_during_inst.log /tmp l118464lwap1049 /tmp 713MB 1GB failed Result: Free disk space check failed for "l118464lwap1049:/tmp" /tmp vrh8 /tmp 692.131MB 1GB failed Result: Free disk space check failed for "vrh8:/tmp" Result: Check for multiple users with UID value 0 failed [root@vrh8 vrh8]# cd /tmp [root@vrh8 tmp]# df -lh . Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg0-tmp 992M 263M 679M 28% /tmp We installed 11gR2 GI software and applied PSU2 patches upon getting runupgrade.sh prompt. runupgrade.sh hang on the first node. We followed "How to Proceed from Failed Upgrade to 11gR2 Grid Infrastructure on Linux/Unix [ID 969254.1]" 1A section, it didn't help. [root@vrh8 bin]# ./crsctl query crs activeversion Oracle Clusterware active version on the cluster is [11.1.0.7.0] rootupgrade.sh output: [root@vrh8 11.2.0.2]# ./rootupgrade.sh Running Oracle 11g root script... The following environment variables are set as: ORACLE_OWNER= oracrs ORACLE_HOME= /d22/oracrs/11.2.0.2 Enter the full pathname of the local bin directory: [/usr/local/bin]: The contents of "dbhome" have not changed. No need to overwrite. The contents of "oraenv" have not changed. No need to overwrite. The contents of "coraenv" have not changed. No need to overwrite. Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root script. Now product-specific root actions will be performed. Using configuration parameter file: /d22/oracrs/11.2.0.2/crs/install/crsconfig_params LOCAL ADD MODE Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. OLR initialization - successful Adding daemon to inittab ACFS-9200: Supported ACFS-9300: ADVM/ACFS distribution files found. ACFS-9312: Existing ADVM/ACFS installation detected. ACFS-9314: Removing previous ADVM/ACFS installation. ACFS-9315: Previous ADVM/ACFS components successfully removed. ACFS-9307: Installing requested ADVM/ACFS software. ACFS-9308: Loading installed ADVM/ACFS drivers. ACFS-9321: Creating udev for ADVM/ACFS. ACFS-9323: Creating module dependencies - this may take some time. ACFS-9327: Verifying ADVM/ACFS devices. ACFS-9309: ADVM/ACFS installation correctness verified. ****hanging here for more than 2 hrs, so we cancelled it INT at /d22/oracrs/11.2.0.2/crs/install/crsconfig_lib.pm line 1173. /d22/oracrs/11.2.0.2/perl/bin/perl -I/d22/oracrs/11.2.0.2/perl/lib - I/d22/oracrs/11.2.0.2/crs/install /d22/oracrs/11.2.0.2/crs/install/rootcrs.pl execution failed Oracle root script execution aborted! 1. The below logs are required to analyze this issue. NEW_GRID_HOME/cfgtoollogs/crsconfig/*.* NEW_GRID_HOME/log/<nodename>/*.* Please upload the logs under the above directories. Zip and upload the files including the subdirectories. 2. When the rootupgrade was handing, did you check the usage of /tmp. Was free space exhausting? === ODM Research === There has been multiple root script run for upgrade. I have taken the first incident from the file rootcrs_vrh8.log: ----------------------------------------- 2011-02-13 13:07:55: Successfully started requested Oracle stack daemons 2011-02-13 13:07:55: Upgrading the existing voting disks! 2011-02-13 13:07:55: Executing /d22/oracrs/11.2.0.2/bin/cssvfupgd 2011-02-13 13:07:55: Executing cmd: /d22/oracrs/11.2.0.2/bin/cssvfupgd <<<<<<<<<<<<<<< The root script seems to hang at this point. 2011-02-13 15:01:16: ###### Begin DIE Stack Trace ###### 2011-02-13 15:01:16: Package File Line Calling 2011-02-13 15:01:16: --------------- -------------------- ---- ---------- 2011-02-13 15:01:16: 1: main rootcrs.pl 325 crsconfig_lib::dietrap 2011-02-13 15:01:16: 2: crsconfig_lib crsconfig_lib.pm 9301 main::__ANON__ 2011-02-13 15:01:16: 3: crsconfig_lib crsconfig_lib.pm 9301 (eval) 2011-02-13 15:01:16: 4: crsconfig_lib crsconfig_lib.pm 9260 crsconfig_lib::system_cmd_capture1 2011-02-13 15:01:16: 5: crsconfig_lib crsconfig_lib.pm 9247 crsconfig_lib::system_cmd_capture 2011-02-13 15:01:16: 6: crsconfig_lib crsconfig_lib.pm 924 crsconfig_lib::system_cmd 2011-02-13 15:01:16: 7: oracss oracss.pm 275 crsconfig_lib::run_crs_cmd 2011-02-13 15:01:16: 8: crsconfig_lib crsconfig_lib.pm 1019 oracss::CSS_upgrade 2011-02-13 15:01:16: 9: crsconfig_lib crsconfig_lib.pm 1006 crsconfig_lib::start_cluster 2011-02-13 15:01:16: 10: main rootcrs.pl 697 crsconfig_lib::perform_start_cluster 2011-02-13 15:01:16: ####### End DIE Stack Trace ####### cssvfupgd.log: -------------------- Oracle Database 11g Clusterware Release 11.2.0.2.0 - Production Copyright 1996, 2010 Oracle. All rights reserved. 2011-02-13 13:07:55.356: [ OCRRAW][3605955376]prgval:buffer passed is too small 2011-02-13 13:07:55.361: [CSSVFUPG][3605955376]cssvfupgd_GetVFList: found voting file /s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat 2011-02-13 13:07:55.365: [ OCRRAW][3605955376]prgval:buffer passed is too small 2011-02-13 13:07:55.369: [CSSVFUPG][3605955376]cssvfupgd_GetVFList: found voting file /s01/app/ocrvot/VOTEDISK/UAT2_vdisk2.dat 2011-02-13 13:07:55.373: [ OCRRAW][3605955376]prgval:buffer passed is too small 2011-02-13 13:07:55.377: [CSSVFUPG][3605955376]cssvfupgd_GetVFList: found voting file /s01/app/ocrvot/VOTEDISK/UAT2_vdisk3.dat 2011-02-13 13:07:55.402: [CSSVFUPG][3605955376]cssvfupgd_SetNum: Processing SYSTEM.css.misscount 2011-02-13 13:07:55.404: [CSSVFUPG][3605955376]cssvfupgd_SetNum: Processing SYSTEM.css.disktimeout 2011-02-13 13:07:55.406: [CSSVFUPG][3605955376]cssvfupgd_SetNum: Processing SYSTEM.css.reboottime 2011-02-13 13:07:55.408: [CSSVFUPG][3605955376]cssvfupgd_SetNum: Processing SYSTEM.css.diagwait 2011-02-13 13:07:55.414: [CSSVFUPG][3605955376]cssvfupgd_SetNum: Processing SYSTEM.css.pollinterval 2011-02-13 13:07:55.416: [CSSVFUPG][3605955376]cssvfupgd_GetGUID: Fetching GUID for /s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat 2011-02-13 13:07:55.419: [ SKGFD][3605955376]NOTE: No asm libraries found in the system 2011-02-13 13:07:55.419: [ CLSF][3605955376]Allocated CLSF context 2011-02-13 13:07:55.419: [ SKGFD][3605955376]Discovery with str:/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat: 2011-02-13 13:07:55.419: [ SKGFD][3605955376]UFS discovery with :/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat: 2011-02-13 13:07:55.420: [ SKGFD][3605955376]Fetching UFS disk :/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat: 2011-02-13 13:07:55.420: [ SKGFD][3605955376]OSS discovery with :/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat: 2011-02-13 13:07:55.421: [ SKGFD][3605955376]Handle 0x124de360 from lib :UFS:: for disk :/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat: 2011-02-13 14:19:31.132: [ SKGFD][3605955376]WARNING:io_getevents timed out 2226 sec >>>>>>>>>>>>>>>>>>>> After about one hour it shows time out error. 2011-02-13 14:19:31.132: [ SKGFD][3605955376]WARNING:io_getevents timed out 2226 sec The script has stalled at the voting disk upgrade phase. Please provide me the below details. 1. What cluster file system are you using for the voting files? provide its details and the mount options used. for ocfs, get its mount options mount | grep ocfs 3. Voting disks details ls -l /s01/app/ocrvot/VOTEDISK/UAT2_vdisk* 4. Get the diagwait detail. OLD_CRS_HOME/bin/crsctl get css diagwait 1. What cluster file system are you using for the voting files? provide its details and the mount options used /dev/emcpowera1 on /s01/app/ocrvot type ocfs2 (rw,_netdev,datavolume,nointr,heartbeat=local) 2. Voting disks details [root@vrh8 11.2.0.2]# ls -l /s01/app/ocrvot/VOTEDISK/UAT2_vdisk* -rw-r----- 1 oracrs oinstall 21004288 Jun 11 07:31 /s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat -rw-r----- 1 oracrs oinstall 21004288 Jun 11 07:31 /s01/app/ocrvot/VOTEDISK/UAT2_vdisk2.dat -rw-r----- 1 oracrs oinstall 21004288 Jun 11 07:31 /s01/app/ocrvot/VOTEDISK/UAT2_vdisk3.dat 3. Get the diagwait detail crsctl get css diagwait Failure 33 in main Oracle Cluster Registry context initialization: PROC-33: Oracle Cluster Registry is not configured Operating System error [No such file or directory] [2] owc may not be required now as the issue we face is clear. The diagwait should not error out, as explained in the following note, 11gR2 rootupgrade.sh Fails as cssvfupgd Can not Upgrade Voting Disk (Doc ID 1102283.1) Make sure you are running 'crsctl get css diagwait' from the old crs home. You can also check it in multiple node. If it errors out, this has to be fixed as explained in the above note. according to that note ,When I ./oprocd stop ,get error: [root@l118464lwap1049 bin]# ./oprocd stop Jun 16 23:24:42.966 | ERR | failed to connect to daemon, errno(111) ACFS-9200: Supported ACFS-9300: ADVM/ACFS distribution files found. ACFS-9307: Installing requested ADVM/ACFS software. ACFS-9308: Loading installed ADVM/ACFS drivers. ACFS-9321: Creating udev for ADVM/ACFS. ACFS-9323: Creating module dependencies - this may take some time. ACFS-9327: Verifying ADVM/ACFS devices. ACFS-9309: ADVM/ACFS installation correctness verified. cssvfupgd.log 2011-02-13 23:36:49.311: [ OCRRAW][3394941744]prgval:buffer passed is too small 2011-02-13 23:36:49.315: [CSSVFUPG][3394941744]cssvfupgd_GetVFList: found voting file /s01/app/ocrvot/VOTEDISK/UAT2_vdisk2.dat 2011-02-13 23:36:49.319: [ OCRRAW][3394941744]prgval:buffer passed is too small 2011-02-13 23:36:49.323: [CSSVFUPG][3394941744]cssvfupgd_GetVFList: found voting file /s01/app/ocrvot/VOTEDISK/UAT2_vdisk3.dat 2011-02-13 23:36:49.351: [CSSVFUPG][3394941744]cssvfupgd_SetNum: Processing SYST EM.css.misscount 2011-02-13 23:36:49.354: [CSSVFUPG][3394941744]cssvfupgd_SetNum: Processing SYST EM.css.disktimeout 2011-02-13 23:36:49.356: [CSSVFUPG][3394941744]cssvfupgd_SetNum: Processing SYST EM.css.reboottime 2011-02-13 23:36:49.358: [CSSVFUPG][3394941744]cssvfupgd_SetNum: Processing SYST EM.css.diagwait 2011-02-13 23:36:49.367: [CSSVFUPG][3394941744]cssvfupgd_SetNum: Processing SYST EM.css.pollinterval 2011-02-13 23:36:49.369: [CSSVFUPG][3394941744]cssvfupgd_GetGUID: Fetching GUID for /s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat 2011-02-13 23:36:49.371: [ SKGFD][3394941744]NOTE: No asm libraries found in t he system 2011-02-13 23:36:49.372: [ CLSF][3394941744]Allocated CLSF context 2011-02-13 23:36:49.372: [ SKGFD][3394941744]Discovery with str:/s01/app/ocrvo t/VOTEDISK/UAT2_vdisk1.dat: 2011-02-13 23:36:49.372: [ SKGFD][3394941744]UFS discovery with :/s01/app/ocrv ot/VOTEDISK/UAT2_vdisk1.dat: 2011-02-13 23:36:49.372: [ SKGFD][3394941744]Fetching UFS disk :/s01/app/ocrvo t/VOTEDISK/UAT2_vdisk1.dat: 2011-02-13 23:36:49.372: [ SKGFD][3394941744]OSS discovery with :/s01/app/ocrv ot/VOTEDISK/UAT2_vdisk1.dat: 2011-02-13 23:36:49.372: [ SKGFD][3394941744]Handle 0x98c4360 from lib :UFS:: for disk :/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat: Question: in Your update about cssvfupgd.log You stated it was hanging there. Is there an entry after about 70 minutes about a timeout in that log file like: 2011-02-13 23:36:49.372: [ SKGFD][3394941744]Handle 0x98c4360 from lib :UFS:: for disk :/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat: 2011-02-17 0:48:19.372: [ SKGFD][3394941744]WARNING:io_getevents timed out 4294 sec <<<< present ??? Please provide the following outputs: rpm -qa|grep ocfs2 uname -a cat /etc/redhat-release [root@vrh8 ~]# rpm -qa|grep ocfs2 ocfs2console-1.4.4-1.el5 ocfs2-tools-1.4.4-1.el5 ocfs2-2.6.18-238.5.1.el5-1.4.7-1.el5 [root@vrh8 ~]# uname -a Linux vrh8 2.6.18-238.5.1.el5 #1 SMP Mon Feb 21 05:52:39 EST 2011 x86_64 x86_64 x86_64 GNU/Linux [root@vrh8 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.6 (Tikanga) [root@vrh8 ~]# Combinations that install SUCCESSFUL: OEL5.4+ocfs2-1.4.7-1+ocfs2-tools-1.4.4 OEL5.6+ocfs2-1.4.8-1+ocfs2-tools-1.6.3 OEL5.6+ocfs2-1.4.7-1+ocfs2-tools-1.4.4 RHLE5.6+OEL kernel(redhat compatible kernel)+ocfs2-1.4.8-1+ocfs2-tools-1.6.3 RHLE5.6+OEL kernel(redhat compatible kernel)+ocfs2-1.4.7-1+ocfs2-tools-1.4.4 RHEL5.4 Combinations that failed: RHLE5.6(redhat kernel)+ocfs2-1.4.7-1+ocfs2-tools-1.4.4 RHLE5.6(redhat kernel)+ocfs2-1.4.8-1+ocfs2-tools-1.6.3 Problem reproduces with redhat kernel -- RHEL 5.6 with 2.6.18-2xx kernels Please review the following Note to change the location of your voting disk Note 428681.1 Title: How to ADD/REMOVE/REPLACE/MOVE Oracle Cluster Registry (OCR) and Voting Disk Pasting info from -- Oracle? Clusterware Administration and Deployment Guide 11g Release 2 (11.2) 3 Managing Oracle Cluster Registry and Voting Disks Oracle Universal Installer for Oracle Clusterware 11g release 2 (11.2), does not support the use of raw or block devices. However, if you upgrade from a previous Oracle Clusterware release, then you can continue to use raw or block devices. [oracrs@vrh8 grid]$ grep fail cluvfy_during_inst_061711.log /tmp l118464lwap1049 /tmp 706MB 1GB failed Result: Free disk space check failed for "l118464lwap1049:/tmp" /tmp vrh8 /tmp 927.1312MB 1GB failed Result: Free disk space check failed for "vrh8:/tmp" Result: Check for multiple users with UID value 0 failed PRVF-5431 : Oracle Cluster Voting Disk configuration check failed [oracrs@vrh8 grid]$ ./runcluvfy.sh stage -pre crsinst -n vrh8,l118464lwap1049 -verbose|tee cluvfy_during_inst.log Please upload the following Cluvfy trace log -- $ORA_CRS_HOME/cv/log/cvutrace.log.0 Please download the latest CVU from OTN: http://www.oracle.com/technetwork/database/clustering/downloads/cvu-download-homepage-099973.html Please upload /s02/app/crs/11.2.0.2/log/vrh8/agent/ohasd/oraagent_oracrs/oraagent_oracrs.log In addition pls upload /s02/app/crs/11.2.0.2/log/vrh8/agent/ohasd/oracssdagent_root/oracssdagent_root.log Please run this command on both the new setup and your existing production setup for a quick comparison -- rpm -qa|grep ocfs2 Server with issue: [root@vrh8 ohasd]# rpm -qa|grep ocfs2 ocfs2console-1.4.4-1.el5 ocfs2-tools-1.4.4-1.el5 ocfs2-2.6.18-238.5.1.el5-1.4.7-1.el5 Prod: [root@vrh9 bin]# rpm -qa|grep ocfs2 ocfs2-2.6.18-194.el5-1.4.7-1.el5 ocfs2console-1.4.4-1.el5 ocfs2-tools-1.4.4-1.el5 ocfs2-2.6.18-194.8.1.el5-1.4.7-1.el5 [root@vrh8 ~]# uname -a Linux vrh8 2.6.18-238.5.1.el5 #1 SMP Mon Feb 21 05:52:39 EST 2011 x86_64 x86_64 x86_64 GNU/Linux [root@vrh8 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.6 (Tikanga) rpm -qa|grep ocfs2 ocfs2console-1.4.4-1.el5 ocfs2-tools-1.4.4-1.el5 ocfs2-2.6.18-238.5.1.el5-1.4.7-1.el5 @ . from Bug 11876815 (Doc ID 1321757.1) @ combinations that install SUCCESSFUL: @ . @ OEL5.4+ocfs2-1.4.7-1+ocfs2-tools-1.4.4 @ OEL5.6+ocfs2-1.4.8-1+ocfs2-tools-1.6.3 @ OEL5.6+ocfs2-1.4.7-1+ocfs2-tools-1.4.4 @ RHLE5.6+OEL kernel(redhat compatible kernel)+ocfs2-1.4.8-1+ocfs2-tools-1.6.3 @ RHLE5.6+OEL kernel(redhat compatible kernel)+ocfs2-1.4.7-1+ocfs2-tools-1.4.4 @ RHEL5.4 @ . @ combinations that failed: @ RHLE5.6(redhat kernel)+ocfs2-1.4.7-1+ocfs2-tools-1.4.4 @ RHLE5.6(redhat kernel)+ocfs2-1.4.8-1+ocfs2-tools-1.6.3 @ . @ . @ So that is clear that , it is redhat kernel's problem.Since RHEL5.6 redhat @ provided 2.6.18-2xx kernels, we can't fix redhat kernels, please use Oracle @ Enterprise kernel (redhat compatible) for installation. As per last action plan (conveyed if any) you need to contact REDHAT support to know the cause of this issue. Workaround is to not use OCFS and go for raw device for upgrade to succeed. A Oracle bug 11876815 was logged internally for this hang issue and few combinations of OEL, RHEL, OCFS2 were tried and tested and the combination you are using has not worked for us too (per bug internal updates given above) The solution provided by Oracle bug developer is to use OEL and not RHEL or contact RHEL support for identifying the cause and solution (incase they have already tested this setup). Let me know if RHEL support is already engaged and provide the case id so that I can open internal SR for Oracle/Red Hat Joint Escalation Team (JET) Engagement for both vendors to work together internally. + the SR issue of grid upgrade from 11.1 to 11.2.0.2.2 is resolved - voting disk was moved from ocfs to raw device - as a workaround for Bug 11876815 - set TMP and TEMP env to new dir with availabe space before running the installer and prechecks to succeed - applied GIPSU#2 before the rootupgrade.sh step - rootupgrade.sh step was successful on all nodes - verified post upgrade checks and logs to confirm GI upgrade was success ! + DB upgrade to 11.2.0.2 Plus PSU#2 will be resumed shorlty