存在这种可能性,即ORACLE ASM在add disk扩盘时add disk操作正常完成,disk group的rebalance其实还没有开始,但是由于新加入的disk存在硬件故障,导致add disk后写入到disk header的所有metadata元数据全部丢失,且由于diskgroup是外部冗余即EXTERNAL REdundancy所以该diskgroup由于已经加入了一个DISK,而该DISK上的metadata全部丢失的缘故,所以该diskgroup 将无法正常MOUNT。
且由于新加入的disk上的所有metadata都丢失了,而不仅仅是丢失了disk header的KFBTYP_DISKHEAD,所以还不能是仅仅将KFBTYP_DISKHEAD的信息通过kfed merge其他的disk信息并做修改来还原,其需要通过特殊的手工处理才能绕过该问题。
如下面的例子:
SUCCESS: diskgroup TESTDG03 was created NOTE: cache deleting context for group TESTDG03 1/0x86485c30 NOTE: cache registered group TESTDG03 number=1 incarn=0xab385c36 NOTE: cache began mount (first) of group TESTDG03 number=1 incarn=0xab385c36 NOTE: Assigning number (1,3) to disk (/oracleasm/asm-disk04) NOTE: Assigning number (1,2) to disk (/oracleasm/asm-disk03) NOTE: Assigning number (1,1) to disk (/oracleasm/asm-disk02) NOTE: Assigning number (1,0) to disk (/oracleasm/asm-disk01) Thu Jan 29 08:21:07 2015 NOTE: GMON heartbeating for grp 1 GMON querying group 1 at 92 for pid 20, osid 20176 Thu Jan 29 08:21:07 2015 NOTE: cache opening disk 0 of grp 1: TESTDG03_0000 path:/oracleasm/asm-disk01 NOTE: F1X0 found on disk 0 au 2 fcn 0.0 NOTE: cache opening disk 1 of grp 1: TESTDG03_0001 path:/oracleasm/asm-disk02 NOTE: cache opening disk 2 of grp 1: TESTDG03_0002 path:/oracleasm/asm-disk03 NOTE: cache opening disk 3 of grp 1: TESTDG03_0003 path:/oracleasm/asm-disk04 NOTE: cache mounting (first) external redundancy group 1/0xAB385C36 (TESTDG03) NOTE: cache recovered group 1 to fcn 0.0 NOTE: redo buffer size is 256 blocks (1053184 bytes) Thu Jan 29 08:21:07 2015 NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (TESTDG03) NOTE: LGWR found thread 1 closed at ABA 0.10750 NOTE: LGWR mounted thread 1 for diskgroup 1 (TESTDG03) NOTE: LGWR opening thread 1 at fcn 0.0 ABA 2.0 NOTE: setting 11.2 start ABA for group TESTDG03 thread 1 to 2.0 NOTE: cache mounting group 1/0xAB385C36 (TESTDG03) succeeded NOTE: cache ending mount (success) of group TESTDG03 number=1 incarn=0xab385c36 GMON querying group 1 at 93 for pid 13, osid 4612 Thu Jan 29 08:21:07 2015 NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1 SUCCESS: diskgroup TESTDG03 was mounted SUCCESS: CREATE DISKGROUP TESTDG03 EXTERNAL REDUNDANCY DISK '/oracleasm/asm-disk01' SIZE 129500M , '/oracleasm/asm-disk02' SIZE 128800M , '/oracleasm/asm-disk03' SIZE 129200M , '/oracleasm/asm-disk04' SIZE 128800M ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */ Thu Jan 29 08:21:07 2015 NOTE: diskgroup resource ora.TESTDG03.dg is online NOTE: diskgroup resource ora.TESTDG03.dg is updated Thu Jan 29 08:21:23 2015 SQL> alter diskgroup testdg03 add disk '/oracleasm/asm-disk06' ORA-15032: not all alterations performed ORA-15260: permission denied on ASM disk group ERROR: alter diskgroup testdg03 add disk '/oracleasm/asm-disk06' Thu Jan 29 08:21:31 2015 SQL> alter diskgroup testdg03 add disk '/oracleasm/asm-disk06' NOTE: Assigning number (1,4) to disk (/oracleasm/asm-disk06) NOTE: requesting all-instance membership refresh for group=1 NOTE: initializing header on grp 1 disk TESTDG03_0004 NOTE: requesting all-instance disk validation for group=1 Thu Jan 29 08:21:32 2015 NOTE: skipping rediscovery for group 1/0xab385c36 (TESTDG03) on local instance. NOTE: requesting all-instance disk validation for group=1 NOTE: skipping rediscovery for group 1/0xab385c36 (TESTDG03) on local instance. NOTE: initiating PST update: grp = 1 Thu Jan 29 08:21:32 2015 GMON updating group 1 at 94 for pid 21, osid 22706 NOTE: PST update grp = 1 completed successfully NOTE: membership refresh pending for group 1/0xab385c36 (TESTDG03) GMON querying group 1 at 95 for pid 13, osid 4612 NOTE: cache opening disk 4 of grp 1: TESTDG03_0004 path:/oracleasm/asm-disk06 GMON querying group 1 at 96 for pid 13, osid 4612 SUCCESS: refreshed membership for 1/0xab385c36 (TESTDG03) SUCCESS: alter diskgroup testdg03 add disk '/oracleasm/asm-disk06' NOTE: Attempting voting file refresh on diskgroup TESTDG03 Thu Jan 29 08:22:09 2015 SQL> alter diskgroup testdg03 dismount NOTE: cache dismounting (clean) group 1/0xAB385C36 (TESTDG03) NOTE: messaging CKPT to quiesce pins Unix process pid: 22730, image: [email protected] (TNS V1-V3) Thu Jan 29 08:22:10 2015 NOTE: LGWR doing clean dismount of group 1 (TESTDG03) NOTE: LGWR closing thread 1 of diskgroup 1 (TESTDG03) at ABA 2.15 NOTE: cache dismounted group 1/0xAB385C36 (TESTDG03) Thu Jan 29 08:22:10 2015 GMON dismounting group 1 at 97 for pid 21, osid 22730 NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment SUCCESS: diskgroup TESTDG03 was dismounted NOTE: cache deleting context for group TESTDG03 1/0xab385c36 Thu Jan 29 08:22:10 2015 NOTE: diskgroup resource ora.TESTDG03.dg is offline SUCCESS: alter diskgroup testdg03 dismount NOTE: diskgroup resource ora.TESTDG03.dg is updated SQL> alter diskgroup testdg03 mount NOTE: cache registered group TESTDG03 number=1 incarn=0x83f85c5f NOTE: cache began mount (first) of group TESTDG03 number=1 incarn=0x83f85c5f NOTE: Assigning number (1,3) to disk (/oracleasm/asm-disk04) NOTE: Assigning number (1,2) to disk (/oracleasm/asm-disk03) NOTE: Assigning number (1,1) to disk (/oracleasm/asm-disk02) NOTE: Assigning number (1,0) to disk (/oracleasm/asm-disk01) Thu Jan 29 08:22:22 2015 NOTE: GMON heartbeating for grp 1 GMON querying group 1 at 100 for pid 21, osid 22730 Thu Jan 29 08:22:22 2015 NOTE: Assigning number (1,4) to disk () GMON querying group 1 at 101 for pid 21, osid 22730 NOTE: cache dismounting (clean) group 1/0x83F85C5F (TESTDG03) NOTE: messaging CKPT to quiesce pins Unix process pid: 22730, image: [email protected] (TNS V1-V3) NOTE: dbwr not being msg'd to dismount NOTE: lgwr not being msg'd to dismount NOTE: cache dismounted group 1/0x83F85C5F (TESTDG03) NOTE: cache ending mount (fail) of group TESTDG03 number=1 incarn=0x83f85c5f NOTE: cache deleting context for group TESTDG03 1/0x83f85c5f GMON dismounting group 1 at 102 for pid 21, osid 22730 NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment NOTE: Disk in mode 0x8 marked for de-assignment ERROR: diskgroup TESTDG03 was not mounted ORA-15032: not all alterations performed ORA-15040: diskgroup is incomplete ORA-15042: ASM disk "4" is missing from group number "1" ERROR: alter diskgroup testdg03 mount Thu Jan 29 08:27:37 2015 SQL> alter diskgroup testdg03 mount NOTE: cache registered group TESTDG03 number=1 incarn=0x56985c64 NOTE: cache began mount (first) of group TESTDG03 number=1 incarn=0x56985c64 NOTE: Assigning number (1,3) to disk (/oracleasm/asm-disk04) NOTE: Assigning number (1,2) to disk (/oracleasm/asm-disk03) NOTE: Assigning number (1,1) to disk (/oracleasm/asm-disk02) NOTE: Assigning number (1,0) to disk (/oracleasm/asm-disk01) Thu Jan 29 08:27:43 2015 NOTE: GMON heartbeating for grp 1 GMON querying group 1 at 105 for pid 21, osid 23017 NOTE: cache opening disk 0 of grp 1: TESTDG03_0000 path:/oracleasm/asm-disk01 NOTE: F1X0 found on disk 0 au 2 fcn 0.0 NOTE: cache opening disk 1 of grp 1: TESTDG03_0001 path:/oracleasm/asm-disk02 NOTE: cache opening disk 2 of grp 1: TESTDG03_0002 path:/oracleasm/asm-disk03 NOTE: cache opening disk 3 of grp 1: TESTDG03_0003 path:/oracleasm/asm-disk04 NOTE: cache mounting (first) external redundancy group 1/0x56985C64 (TESTDG03) NOTE: cache recovered group 1 to fcn 0.609 NOTE: redo buffer size is 256 blocks (1053184 bytes) Thu Jan 29 08:27:43 2015 NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (TESTDG03) NOTE: LGWR found thread 1 closed at ABA 2.15 NOTE: LGWR mounted thread 1 for diskgroup 1 (TESTDG03) NOTE: LGWR opening thread 1 at fcn 0.609 ABA 3.16 NOTE: cache mounting group 1/0x56985C64 (TESTDG03) succeeded NOTE: cache ending mount (success) of group TESTDG03 number=1 incarn=0x56985c64 GMON querying group 1 at 106 for pid 13, osid 4612 Thu Jan 29 08:27:43 2015 NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1 SUCCESS: diskgroup TESTDG03 was mounted SUCCESS: alter diskgroup testdg03 mount Thu Jan 29 08:27:43 2015 NOTE: diskgroup resource ora.TESTDG03.dg is online NOTE: diskgroup resource ora.TESTDG03.dg is updated Thu Jan 29 08:33:52 2015 SQL> alter diskgroup testdg03 check all norepair NOTE: starting check of diskgroup TESTDG03 Thu Jan 29 08:33:52 2015 GMON checking disk 0 for group 1 at 107 for pid 21, osid 23017 GMON checking disk 1 for group 1 at 108 for pid 21, osid 23017 GMON checking disk 2 for group 1 at 109 for pid 21, osid 23017 GMON checking disk 3 for group 1 at 110 for pid 21, osid 23017 ERROR: no kfdsk for (4) ERROR: check of diskgroup TESTDG03 found 1 total errors ORA-15049: diskgroup "TESTDG03" contains 1 error(s) ORA-15032: not all alterations performed ORA-15049: diskgroup "TESTDG03" contains 1 error(s) ERROR: alter diskgroup testdg03 check all norepair Thu Jan 29 08:34:07 2015 SQL> alter diskgroup testdg03 check all NOTE: starting check of diskgroup TESTDG03 Thu Jan 29 08:34:07 2015 GMON checking disk 0 for group 1 at 111 for pid 21, osid 23017 GMON checking disk 1 for group 1 at 112 for pid 21, osid 23017 GMON checking disk 2 for group 1 at 113 for pid 21, osid 23017 GMON checking disk 3 for group 1 at 114 for pid 21, osid 23017 ERROR: no kfdsk for (4) ERROR: check of diskgroup TESTDG03 found 1 total errors ORA-15049: diskgroup "TESTDG03" contains 1 error(s) ORA-15032: not all alterations performed ORA-15049: diskgroup "TESTDG03" contains 1 error(s) ERROR: alter diskgroup testdg03 check all SQL> alter diskgroup testdg03 check all repair NOTE: starting check of diskgroup TESTDG03 GMON checking disk 0 for group 1 at 115 for pid 21, osid 23017 GMON checking disk 1 for group 1 at 116 for pid 21, osid 23017 GMON checking disk 2 for group 1 at 117 for pid 21, osid 23017 GMON checking disk 3 for group 1 at 118 for pid 21, osid 23017 ERROR: no kfdsk for (4) ERROR: check of diskgroup TESTDG03 found 1 total errors ORA-15049: diskgroup "TESTDG03" contains 1 error(s) ORA-15032: not all alterations performed ORA-15049: diskgroup "TESTDG03" contains 1 error(s) ERROR: alter diskgroup testdg03 check all repair
[oracle@mlab2 oracleasm]$ oerr ora 15042
15042, 00000, “ASM disk \”%s\” is missing from group number \”%s\” ”
// *Cause: The specified disk, which is a necessary part of a diskgroup,
// could not be found on the system.
// *Action: Check the hardware configuration.
//
ORA-15042错误正是因为add disk的磁盘上的metadata全部丢失了,但搞笑的时候新加入的盘上可能因为还没有开始rebalance而没有一点真正有意义的数据,但因为ASM认为该disk已经add进来了,所以必须要该disk可用才能mount diskgroup。 而且用户甚至无法强制DROP这个DISK,原因是需要DISKGROUP在MOUNT状态下才可以drop disk, 这就变成了鸡生蛋 蛋生鸡的死循环, 要DROP这个disk必须MOUNT DISKGROUP,但要MOUNT DISKGROUP要先DROP该DISK。
对于此问题一般需要诗檀软件工程师手动修改ASM metadata来绕过问题,或者如果有之前的ASM metadata也可以采用。
Leave a Reply