一套Windows上的11.2.0.1单实例数据库在database open阶段出现了ORA-07445:core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] []内部错误,具体的出错日志如下:
LOG CONTENT =======================ALERT.LOG============================ Starting ORACLE instance (normal) LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Picked latch-free SCN scheme 2 Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST ARCH: Warning; less destinations available than specified by LOG_ARCHIVE_MIN_SUCCEED_DEST init.ora parameter Autotune of undo retention is turned on. IMODE=BR ILAT =84 2011-08-01 13:13:47.068000 +08:00 LICENSE_MAX_USERS = 0 SYS auditing is disabled Starting up: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options. Using parameter settings in server-side spfile C:\APP\PRODUCT\11.2.0\DBHOME_1\DATABASE\SPFILEG11R2.ORA System parameters with non-default values: _spin_count = 2000 processes = 500 event = "10500 trace name context forever,level 8:10013 trace name context forever,level 10: 10015 trace name context forever,level 10" sga_max_size = 600M shared_pool_size = 152M large_pool_size = 32M java_pool_size = 4M streams_pool_size = 0 _db_file_direct_io_count = 12 sga_target = 0 memory_target = 0 control_files = "C:\APP\ORADATA\G11R2\CONTROLFILE\O1_MF_6VWCSH9J_.CTL" control_files = "C:\APP\FLASH_RECOVERY_AREA\G11R2\CONTROLFILE\O1_MF_6VWCSHNF_.CTL" db_block_checksum = "TRUE" db_block_size = 8192 db_cache_size = 196M _shared_io_pool_size = 0 compatible = "11.2.0.0.0" log_archive_dest_2 = "service=stdby optional lgwr sync affirm valid_for=(online_logfiles,all_roles)" log_buffer = 10485760 db_create_file_dest = "C:\app\oradata" db_recovery_file_dest = "C:\app\flash_recovery_area" db_recovery_file_dest_size= 500000M undo_tablespace = "UNDOTBS1" _kgl_bucket_count = 2 remote_login_passwordfile= "EXCLUSIVE" db_domain = "" session_cached_cursors = 300 audit_file_dest = "C:\APP\ADMIN\G11R2\ADUMP" optimizer_features_enable= "10.2.0.4" audit_trail = "DB" cell_offload_plan_display= "ALWAYS" db_name = "G11R2" open_cursors = 3000 _optimizer_extended_cursor_sharing_rel= "NONE" pga_aggregate_target = 300M diagnostic_dest = "C:\APP" 2011-08-01 13:13:48.164000 +08:00 PMON started with pid=2, OS id=984 VKTM started with pid=3, OS id=3656 at elevated priority VKTM running at (10)millisec precision with DBRM quantum (100)ms GEN0 started with pid=4, OS id=5824 DIAG started with pid=5, OS id=5832 DBRM started with pid=6, OS id=2784 PSP0 started with pid=7, OS id=2500 DIA0 started with pid=8, OS id=5320 MMAN started with pid=9, OS id=4128 DBW0 started with pid=10, OS id=5852 LGWR started with pid=11, OS id=3960 CKPT started with pid=12, OS id=4472 SMON started with pid=13, OS id=5788 RECO started with pid=14, OS id=6036 MMON started with pid=15, OS id=5740 MMNL started with pid=16, OS id=2112 ORACLE_BASE from environment = C:\app alter database mount exclusive 2011-08-01 13:13:52.390000 +08:00 Sweep [inc][135908]: completed NSS2 started with pid=19, OS id=2728 Sweep [inc][135901]: completed Successful mount of redo thread 1, with mount id 2704081164 Database mounted in Exclusive Mode 2011-08-01 13:13:53.413000 +08:00 Lost write protection disabled 2011-08-01 13:13:54.578000 +08:00 Sweep [inc][135897]: completed Sweep [inc2][135908]: completed Sweep [inc2][135901]: completed Sweep [inc2][135897]: completed 2011-08-01 13:13:55.788000 +08:00 Completed: alter database mount exclusive alter database open Beginning crash recovery of 1 threads parallel recovery started with 3 processes 2011-08-01 13:13:56.959000 +08:00 Started redo scan Completed redo scan read 0 KB redo, 0 data blocks need recovery Started redo application at Thread 1: logseq 867, block 88140, scn 9122496 Recovery of Online Redo Log: Thread 1 Group 3 Seq 867 Reading mem 0 Mem# 0: C:\APP\ORADATA\G11R2\ONLINELOG\O1_MF_3_6VWCSMPO_.LOG Mem# 1: C:\APP\FLASH_RECOVERY_AREA\G11R2\ONLINELOG\O1_MF_3_6VWCSNGX_.LOG Completed redo application of 0.00MB Completed crash recovery at Thread 1: logseq 867, block 88140, scn 9142497 0 data blocks read, 0 data blocks written, 0 redo k-bytes read 2011-08-01 13:13:58.738000 +08:00 LGWR: STARTING ARCH PROCESSES ARC0 started with pid=22, OS id=4784 2011-08-01 13:13:59.765000 +08:00 ARC0: Archival started LGWR: STARTING ARCH PROCESSES COMPLETE ARC0: STARTING ARCH PROCESSES ARC1 started with pid=24, OS id=2780 ARC2 started with pid=25, OS id=1288 ARC1: Archival started LGWR: Primary database is in MAXIMUM AVAILABILITY mode ARC2: Archival started ARC1: Becoming the 'no FAL' ARCH ARC1: Becoming the 'no SRL' ARCH ARC2: Becoming the heartbeat ARCH LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR ARC3 started with pid=26, OS id=3876 2011-08-01 13:14:00.828000 +08:00 ARC3: Archival started ARC0: STARTING ARCH PROCESSES COMPLETE NSS2 started with pid=19, OS id=5156 2011-08-01 13:14:29.008000 +08:00 ORA-16198: LGWR received timedout error from KSR 2011-08-01 13:14:35.980000 +08:00 Errors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_lgwr_3960.trc: ORA-16198: Timeout incurred on internal channel during remote archival LGWR: Error 16198 verifying archivelog destination LOG_ARCHIVE_DEST_2 Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED LGWR: Continuing... ARCH: LGWR is scheduled to archive destination LOG_ARCHIVE_DEST_2 after log switch 2011-08-01 13:14:38.629000 +08:00 Trying to expand controlfile section 11 for Oracle Managed Files Exception [type: INT_DIVIDE_BY_ZERO, ] [] [PC:0x500282E, __VInfreq__kcflfi()+466] Errors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_arc0_4784.trc (incident=136091): ORA-07445: exception encountered: core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] [] Incident details in: c:\app\diag\rdbms\g11r2\g11r2\incident\incdir_136091\g11r2_arc0_4784_i136091.trc 2011-08-01 13:14:40.283000 +08:00 Trace dumping is performing id=[cdmp_20110801131440] 2011-08-01 13:14:52.417000 +08:00 Sweep [inc][136091]: completed Sweep [inc2][136091]: completed 2011-08-01 13:14:59.805000 +08:00 ARC2: Detected ARCH process failure ARC2: STARTING ARCH PROCESSES ARC0 started with pid=19, OS id=5016 2011-08-01 13:15:00.836000 +08:00 ARC0: Archival started ARC2: STARTING ARCH PROCESSES COMPLETE 2011-08-01 13:15:36.689000 +08:00 Deleted Oracle managed file C:\APP\FLASH_RECOVERY_AREA\G11R2\ARCHIVELOG\2011_08_01\O1_MF_1_866_73DFKWRK_.ARC 2011-08-01 13:15:38.013000 +08:00 Error 12154 received logging on to the standby Errors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_ora_4852.trc: ORA-12154: TNS:could not resolve the connect identifier specified ARCH: Error 12154 Creating archive log file to 'stdby' Trying to expand controlfile section 11 for Oracle Managed Files Exception [type: INT_DIVIDE_BY_ZERO, ] [] [PC:0x500282E, __VInfreq__kcflfi()+466] Errors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_ora_4852.trc (incident=136051): ORA-07445: exception encountered: core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] [] Incident details in: c:\app\diag\rdbms\g11r2\g11r2\incident\incdir_136051\g11r2_ora_4852_i136051.trc 2011-08-01 13:15:39.680000 +08:00 Trace dumping is performing id=[cdmp_20110801131539] 2011-08-01 13:15:42.782000 +08:00 PMON (ospid: 984): terminating the instance due to error 397 2011-08-01 13:15:50.520000 +08:00 Instance terminated by PMON, pid = 984 =============================g11r2_ora_4852_i136051.trc============================= Dump file c:\app\diag\rdbms\g11r2\g11r2\incident\incdir_136051\g11r2_ora_4852_i136051.trc Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options Windows NT Version V6.1 Service Pack 1 CPU : 4 - type 586, 2 Physical Cores Process Affinity : 0x0x00000000 Memory (Avail/Total): Ph:2122M/3566M, Ph+PgF:5413M/7130M, VA:1084M/2047M Instance name: g11r2 Redo thread mounted by this instance: 1 Oracle process number: 17 Windows thread id: 4852, image: ORACLE.EXE (SHAD) *** 2011-08-01 13:15:38.527 *** SESSION ID:(197.1) 2011-08-01 13:15:38.527 *** CLIENT ID:() 2011-08-01 13:15:38.527 *** SERVICE NAME:() 2011-08-01 13:15:38.527 *** MODULE NAME:(oradim.exe) 2011-08-01 13:15:38.527 *** ACTION NAME:() 2011-08-01 13:15:38.527 Dump continued from file: c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_ora_4852.trc ORA-07445: exception encountered: core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] [] ========= Dump for incident 136051 (ORA 7445 [kcflfi()+466]) ======== ----- Beginning of Customized Incident Dump(s) ----- Exception [type: INT_DIVIDE_BY_ZERO, ] [] [PC:0x500282E, __VInfreq__kcflfi()+466] Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production Process Id: 0x000010bc Thread Id : 0x000012f4 Time : Mon Aug 01 13:15:38 Excp. Code: 0xc0000094 Excp. Type: INT_DIVIDE Flags: 0x00000000 ------------------- Registers ---------------------------- eip = 0500282e esp = 0d9f525c ebp = 0d9f577c edi = 37eefe00 esi = 00000265 eax = 00000265 ebx = 00000000 ecx = 089ee234 edx = 00000000 ecs = 0000001b eds = 00000023 ees = 00000023 ess = 00000023 egs = 00000000 efs = 0000003b eflags = 00010246 ------------------- End of Registers --------------------- *** 2011-08-01 13:15:38.536 dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x3, level=3, mask=0x0) ----- Current SQL Statement for this session (sql_id=a01hp0psv0rrh) ----- alter database open ----------- messages from pre-loading .sym files: Symbol file C:\app\product\11.2.0\dbhome_1\RDBMS\ADMIN\oracommon11.SYM does not match binary. Symbol TimeStamp=4bb5eaac, Module TimeStamp=0 are different Symbol file C:\app\product\11.2.0\dbhome_1\RDBMS\ADMIN\oraclsra11.SYM does not match binary. Symbol TimeStamp=4bb4cf99, Module TimeStamp=0 are different ----------- end of messages from pre-loading .sym files ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- Symbol file C:\app\product\11.2.0\dbhome_1\BIN\oracommon11.SYM does not match binary. Symbol TimeStamp=4bb5eaac, Module TimeStamp=0 are different Symbol file C:\app\product\11.2.0\dbhome_1\BIN\oraclsra11.SYM does not match binary. Symbol TimeStamp=4bb4cf99, Module TimeStamp=0 are different EnumerateLoadedModules64 failed with error -1073741819 Symbol file oraclsra11.SYM does not match binary. Symbol TimeStamp=4bb4cf99, Module TimeStamp=0 are different Symbol file oracommon11.SYM does not match binary. Symbol TimeStamp=4bb5eaac, Module TimeStamp=0 are different __VInfreq__kcflfi() 00000000 +466 _kccrszf()+287 CALLrel _kcflfi() 0 318345B8 34 31C0DD40 4000 265 4 7FFFFFFF 1 0 0 _kccrsd_expd()+1418 CALLrel _kccrszf() D9F7CEC 268 264 _kccwnc_reuse_expan CALLrel _kccrsd_expd() D9F7CEC B 38 d()+640 __VInfreq__kccwnc() CALLrel _kccwnc_reuse_expan D9F7CEC B 26 +235 d() _krse_arc_complete( CALLrel _kccwnc() D9F7CEC D9F6D38 B )+1615 _krse_arc_driver_co CALLrel _krse_arc_complete( D9F78AC re()+1307 ) _krse_arc_driver()+ CALLrel _krse_arc_driver_co D9F7CEC 1 D9F7C6C 0 0 D9F7CC8 274 re() 0 0 0 0 0 0 0 _krsq_arch_to_force CALLrel _krse_arc_driver() D9F7CEC 1 D9F7C6C 0 0 D9F7CC8 _switch()+196 0 0 0 0 0 0 0 __VInfreq__kcttsc() CALLrel _krsq_arch_to_force D9F7CEC 1 +129 _switch() _kcfopd()+1504 CALLrel _kcttsc() 2 _adbdrv()+16700 CALLrel _kcfopd() 0 0 0 0 D9FBBF8 _opiexe()+13594 CALLrel _adbdrv() 4A C0000094 33644518 D9FBD38 6D60697 2F3FC5F0 _opiosq0()+6248 CALLrel _opiexe() 4 0 D9FC704 _kpooprx()+277 CALLrel _opiosq0() 3 E D9FC970 A4 0 _kpoal8()+632 CALLrel _kpooprx() D9FF074 D9FD3F8 13 1 0 A4 _opiodr()+1248 CALLreg 00000000 5E 1C D9FF070 ___dyn_tls_init_cal CALLreg 00000000 5E 1C D9FF070 1 lback()+2935122 _opitsk()+1404 CALL??? 00000000 C9A10E8 5E D9FF070 0 D9FED00 D9FF19C 53E52E 0 D9FF1C8 _opiino()+980 CALLrel _opitsk() 0 0 _opiodr()+1248 CALLreg 00000000 3C 4 D9FFBC4 _opidrv()+1201 CALLrel _opiodr() 3C 4 D9FFBC4 0 _sou2o()+55 CALLrel _opidrv() 3C 4 D9FFBC4 _opimai_real()+124 CALLrel _sou2o() D9FFBD4 3C 4 D9FFBC4 _opimai()+125 CALLrel _opimai_real() 2 D9FFBFC _OracleThreadStart@ CALLrel _opimai() 2 D9FFF3C 0 70 FFFFFFFF 4()+830 FFFFFFFF ___dyn_tls_init_cal CALLptr 00000000 901FF6C D9FFFD4 776437F5 lback()+366382316 901FF6C 765D34CB 0 ___dyn_tls_init_cal CALLreg 00000000 901FF6C 765D34CB 0 0 901FF6C lback()+367384440 0 ___dyn_tls_init_cal CALLrel ___dyn_tls_init_cal 401326 901FF6C 0 0 0 0 lback()+367384392 lback()+367384403 00000000 CALL??? 00000000 --------------------- Binary Stack Dump --------------------- ..................
从以上日志中可以看到在”Trying to expand controlfile section 11 for Oracle Managed Files“扩扎控制文件过程中出现了
_kccwnc_reuse_expan->_kccrsd_expd->_kccrszf->_kcflfi->_VInfreq__kcflfi()
函数的7445错误,kcf意为(manages and coordinates operations on the control file(s),kcf.c),是在处理日志文件中引发了INT_DIVIDE_BY_ZERO除数为零的代码bug。
通过7445和kcflfi关键词在MOS上搜索没有太大的发现,说明该Bug的处罚几率非常低,正好让我碰到说明是某些特殊参数的设置引起了该问题。
目标锁定启动日志中的非默认隐藏参数”_db_file_direct_io_count”,该参数决定了直接路径读写的IO大小,从9i开始该参数的单位调整为bytes而非原先的blocks,之前因为对该参数进行一些测试所以设置了一个较小值。
Parameter: DB_FILE_DIRECT_IO_COUNT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Versions: 8.0 - 8.1 This parameter is hidden in 9.0 onwards. Parameter type: integer Parameter class: dynamic, scope = ALTER SYSTEM DEFERRED Default value: 64 Range of values: operating system-dependent Description: ~~~~~~~~~~~~ DB_FILE_DIRECT_IO_COUNT is used to specify the number of blocks to be used for IO operations done by backup, restore or direct path read and write functions. The IO buffer size is a product of DB_FILE_DIRECT_IO_COUNT and DB_BLOCK_SIZE. The IO buffer size cannot exceed max_IO_size for your platform. Assigning a high value to this parameter results in greater use of PGA or SGA memory. o In Oracle8i, minimize the number of I/O requests by setting the DB_FILE_DIRECT_IO_COUNT instance parameter so that DB_BLOCK_SIZE x DB_FILE_DIRECT_IO_COUNT = max_io_size of system In Oracle8i the default for this is 64 blocks. (In Oracle9i, it is replaced by _DB_FILE_DIRECT_IO_COUNT which governs the size of direct I/Os in BYTES (not blocks). The default is 1Mb but will be sized down if the max_io_size of the system is smaller.) ORA-19863 during RMAN duplicate Applies to: Oracle Server - Enterprise Edition - Version: 10.2.0.3 This problem can occur on any platform. Symptoms -- Problem Statement: Duplicate failed during the datafile restore stage: Starting restore at 2008-Apr-09 09:28:24 using channel ORA_AUX_DISK_1 channel ORA_AUX_DISK_1: starting datafile backupset restore channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set restoring datafile 00003 to /u06/oradata/hcmprdc/sysaux01.dbf ... restoring datafile 00121 to /u06/oradata/hcmprdc/waapp.dbf channel ORA_AUX_DISK_1: reading from backup piece /u04/oradata/flash_recovery_area/HCMPRD/mdjd8s5v_1_1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of Duplicate Db command at 04/09/2008 09:28:28 RMAN-03015: error occurred in stored script Memory Script ORA-19870: error reading backup piece /u04/oradata/flash_recovery_area/HCMPRD/mdjd8s5v_1_1 ORA-19863: device block size 1040384 is larger than max allowed: 262144 Cause The database parameter _db_file_direct_io_count in the target and auxiliary instance does not match. Solution -- To implement the solution: Ensure that parameter _db_file_direct_io_count on the target and auxiliary database the same _DB_FILE_DIRECT_IO_COUNT need to be set to the same value between the source database where the backup was taken and the target database where the backup is being restored. 2.0 Size of Input/Output Buffers ================================ a. input buffers ---------------- NOTE : DB_FILE_DIRECT_IO_COUNT is not available in Oracle9i onwards. In Oracle9i, it is replaced by a hidden _DB_FILE_DIRECT_IO_COUNT which governs the size of direct I/Os in BYTES (not blocks). The default is 1Mb butwill be sized down if the max_io_size of the system is smaller. The input buffer size is: buffersize = db_block_size * db_file_direct_io_count As there are 4 input buffers, the total input buffer memory use per channel is: memory(input) = #buffers * #files * buffersize = 4 * #files * buffersize For example, if 2 channels are used, and each of these channels backs up 3 files, then for each channel memory(input) = 4 * 3 * db_block_size * db_file_direct_io_count b. output buffers ----------------- For disk channels, the output buffer size is: buffersize = db_block_size * db_file_direct_io_count For SBT_TAPE channels, the output buffer size in Oracle8/8i is o/s dependant. (On Solaris, this defaults to 64k) On 9i/10g it defaults to 256k for all platforms. The BLKSIZE argument to 'allocate channel...' can be used to override the default value. As there are 4 output buffers, memory(output) = #buffers * buffersize = 4 * buffersize
一般来说使用该隐藏参数的默认值即可,通过重置该参数后修复启动问题:
SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production PL/SQL Release 11.2.0.1.0 - Production CORE 11.2.0.1.0 Production TNS for 32-bit Windows: Version 11.2.0.1.0 - Production NLSRTL Version 11.2.0.1.0 - Production SQL> alter system reset "_db_file_direct_io_count" scope=spfile; System altered. SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ FROM SYS.x$ksppi x, SYS.x$ksppcv y WHERE x.inst_id = USERENV ('Instance') AND y.inst_id = USERENV ('Instance') AND x.indx = y.indx AND x.ksppinm LIKE '%db_file_direct_io_count%' / NAME VALUE DESCRIB ------------------------------ -------------------- ------------------------------ _db_file_direct_io_count 1048576 Sequential I/O buf size windows上的11gr2默认该参数为1MB
Leave a Reply