2013年2月11日

【Oracle ASM数据恢复】 ORA-600 [kfcChkAio01]错误解析

mac

Author

3 min

Read Time

Views

如果ASM实例经常奔溃crash，且后台日志alert.log中发现如下错误信息的话，则有必要参考本篇文章了：

NOTE: starting recovery of thread=1 ckpt=201.9904 group=2
NOTE: starting recovery of thread=2 ckpt=139.4186 group=2

Tue Dec 16 03:00:51 2008
Errors in file /u01/app/oracle/product/10.2.0/asm/admin/+ASM/udump/+asm2_ora_15305.trc:
ORA-00600: internal error code, arguments: [kfcChkAio01], [], [], [], [], [], [], []
ORA-15196: invalid ASM block header [kfc.c:5552] [endian_kfbh] [2079] [2147483648] [1 != 0]
Abort recovery for domain 2
NOTE: crash recovery signalled OER-600
ERROR: ORA-600 signalled during mount of diskgroup FLASH

这个错误会导致 diskgroup被dismount，一般是由于bug 7589862 所造成。也可以通过trace文件中的stack call来进一步确认是否是该问题：

kfcChkAio

函数kfxdrvMount是mount diskgroup时调用，其属于ASM恢复层kfrcrv 该错误的主要表现即是： ORA-00600: internal error code, arguments: [kfcChkAio01], [], [], [], [], [], [], [] kfcChkAio01 表示IO操作因为无效的块而发生错误 ORA-15196: invalid ASM block header [kfc.c:5552] [endian_kfbh] [2079] [2147483648] [1 != 0] 上面的报错说明无效的块其中

endian_kfbh 是block header中的数据
2079 ASM FILE NUMBER
2147483648 ASM BLOCK NUMBER
1 != 0 1 was the value found on the field referenced on the first argument, but 0 was the expected value.

遇到该问题需要通过手工Patch ASM metadata来解决，如果不熟悉ASM内部结构，那么建议请专业人员来操作。如果自己搞不定可以找ASKMACLEAN专业ORACLE数据库修复团队成员帮您恢复!

Ask_Maclean_liu_Oracle 2015-07-14

Understanding and fixing errors ORA-600 [kfcChkAio01] and ORA-15196. (Doc ID 757529.1)Applies to:Oracle Server - Enterprise Edition - Version: 10.1.0.3 to 11.1.0.7 - Release: 10.1 to 11.1Information in this document applies to any platform.SymptomsErrors ORA-600 [kfcChkAio01] and ORA-15196 can be reported, after a NON-CLEAN dismount of the diskgroup, normally caused by a crash of the ASM instance.During the restart of ASM instance and mounting the diskgroup, following messages will be reported on the alert.log of the ASM instance:* Messages indicating recovery:NOTE: starting recovery of thread=1 ckpt=201.9904 group=2NOTE: starting recovery of thread=2 ckpt=139.4186 group=2* The messages about the error ORA-600 and ORA-15196:Tue Dec 16 03:00:51 2008Errors in file /u01/app/oracle/product/10.2.0/asm/admin/+ASM/udump/+asm2_ora_15305.trc:ORA-00600: internal error code, arguments: [kfcChkAio01], [], [], [], [], [], [], []ORA-15196: invalid ASM block header [kfc.c:5552] [endian_kfbh] [2079] [2147483648] [1 != 0]Abort recovery for domain 2NOTE: crash recovery signalled OER-600ERROR: ORA-600 signalled during mount of diskgroup FLASHAs a result the diskgroup is dismounted. Subsequent mounts will report same set of errors.Bug 7589862 was created for this case.CauseFor the diagnostic and identification of the problem, there are important parts of information dumped into the trace file generated by the errorsThe call stack on the tracekfcChkAio <- kfcGet0 <- kfcGet1Priv <- kfcRcvGet <- kfcema <- kfrPass2 <- kfrcrv <- kfcMount <- kfgInitCache <- kfgFinalizeMount <-kfgscFinalize <- kfgForEachKfgsc <- kfgsoFinalize <- kfgFinalize <- kfxdrvMount <- kfxdrvEntryFunctions on the call stack indicate the operations like mount diskgroup (kfxdrvMount) and Recovery (kfrcrv)Description of the errorsORA-00600: internal error code, arguments: [kfcChkAio01], [], [], [], [], [], [], []kfcChkAio01 will be signaled if the IO operation failed because an invalid block.ORA-15196: invalid ASM block header [kfc.c:5552] [endian_kfbh] [2079] [2147483648] [1 != 0]This error is reported when block failed the validation. The arguments:endian_kfbh is the first field on the block header. This is the field that missed the validation.2079 Is the asm file number. Note that this value will be different on each case2147483648 The block number found on kfbh.block.blk, other field on the block header. Converted to hex, the bytes on the right reference the block number.0X800000001 != 0 1 was the value found on the field referenced on the first argument, but 0 was the expected value.The trace file will have the information about the Cache Element and Buffer header affected by the error:Start recovery for domain 2, valid = 0, flags = 0x4NOTE: starting recovery of thread=1 ckpt=201.9904 group=2NOTE: starting recovery of thread=2 ckpt=139.4186 group=2CE: (0xc0000000153d0bb8) group=2 (FLASH) obj=2079 blk=0 (indirect)hashFlags=0x0100 lid=0x0002 lruFlags=0x0000 bastCount=1redundancy=0x11 fileExtent=0 AUindex=0 blockIndex=0copy #0: disk=0 au=7492BH: (0xc0000000153a54d0) bnum=322 type=rcv reading state=rcvRead chgSt=not modifyingflags=0x00000000 pinmode=excl lockmode=null bf=0xc000000015141000kfbh_kfcbh.fcn_kfbh = -1.-1826817 lowAba=0.0 highAba=0.0last kfcbInitSlot return code=null cpkt lnk is nullFrom the Cache Element, it is possible to identify the disk and allocation unit involved with the error:copy #0: disk=0 au=7492From the alert.log is possible to identify the path of the disk. Review the file back in time and identify the last time diskgroup was mounted without errors. Check formessages like:NOTE: cache opening disk 0 of grp 2: FLASH_0000 path:/dev/rdsk/c29t1d4* The second argument of error ORA-15196 indicate the ASM file number involved with the problem. This can be also validated by some of the information printed in thetrace file, searching for the words KSTDUMP In memory trace dump:KSTDUMP: In-memory trace dumpTIME(usecs):SEQ# ORAPID SID EVENT OP DATA========================================================================88894E39:000E0839 16 255 10495 20 kfcMoveLRU: gn=2 fn=2079 indblk=218 src=5 dest=2 line=320188894E39:000E083A 16 255 10495 3 kfcAddPin: pin=267 kfc.c 3289 excl bnum=189 class=088894E3B:000E083B 16 255 10495 10 kfcbpInit: gn=2 fn=2079 indblk=219 pin=268 excl rcvRead kfr.c 552488894E3C:000E083C 16 255 10495 12 kfcFlush: bnum=190 kfc.c 317988894E3C:000E083D 16 255 10495 11 kfcMakeFree: bnum=190 flags=00000000 kfc.c 318088894E3D:000E083E 16 255 10495 19 kfcMoveBucket: [ gn=2 fn=2079 indblk=26 ] --> [ gn=2 fn=2079 indblk=219 ]From this line:88894E39:000E0839 16 255 10495 20 kfcMoveLRU: gn=2 fn=2079 indblk=218 src=5 dest=2 line=3201gn=2 is the diskgroup numberfn=2079 is the ASM file Numberindblk=218 is the block where the indirect extent is storedAll the references on the In-memory trace dump will be for 256 blocks of the same file, in this case 2079.Validating the content of Allocation Unit, using kfedUsing kfed to dump the blocks on the Allocation Unit referenced on the Cache Element will show invalid data:$kfed read /dev/rdsk/c29t1d4 aunum=7492 blknum=0 ausize=1048576|morekfbh.endian: 1 ; 0x000: 0x01kfbh.hard: 66 ; 0x001: 0x42kfbh.type: 0 ; 0x002: KFBTYP_INVALIDkfbh.datfmt: 0 ; 0x003: 0x00kfbh.block.blk: 89088 ; 0x004: T=0 NUMB=0x15c00kfbh.block.obj: 11626 ; 0x008: TYPE=0x0 NUMB=0x2d6akfbh.check: 2182659237 ; 0x00c: 0x8218bca5kfbh.fcn.base: 4293140479 ; 0x010: 0xffe41fffkfbh.fcn.wrap: 4294967295 ; 0x014: 0xffffffffkfbh.spare1: 4294967247 ; 0x018: 0xffffffcfkfbh.spare2: 4294967295 ; 0x01c: 0xffffffffAll 256 (0 through 255) will have similar content. The type will be KFBTYP_INVALID which indicates content/type of the block is incorrect.The reason of these errors is because during a file creation, ASM incorrectly commits the allocation of an indirect extent before pre-formatting the extent to contain valid blocks. Thus if acrash occurs during the middle of this operation, during recovery the blocks for the indirect extents are found unformatted (kfbh.type: 0 ; 0x002: KFBTYP_INVALID), signaling the errorsalready mentioned.

评论 (1)