2010年8月12日

ora-600 [17182]错误一例

mac

Author

10 min

Read Time

352

Views

这是一套古老的系统，SUNOS 5.8，Oracle 8.1.7.4。最近老革命途遇新问题，告警日志烽烟掠起：

Errors in file /u01/app/oracle/admin/CULPRODB/udump/culprodb_ora_7913.trc:
ORA-00600: internal error code, arguments: [17182], [32438472], [], [], [], [], [], []
Thu Jul 15 16:19:29 2010
Errors in file /u01/app/oracle/admin/CULPRODB/udump/culprodb_ora_7913.trc:
ORA-00600: internal error code, arguments: [17182], [32438472], [], [], [], [], [], []
Thu Jul 15 16:19:30 2010
Errors in file /u01/app/oracle/admin/CULPRODB/udump/culprodb_ora_7913.trc:
ORA-00600: internal error code, arguments: [17182], [32438472], [], [], [], [], [], []

如果你像我一样对600着迷，那么点击这里欣赏一下这个trace文件。报错期间运行的SQL及调用栈信息:

ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [17182], [32438472], [], [], [], [], [], []
Current SQL statement for this session:
select * from olsuser.cardmaster where cm_card_no between '2336330010201570013' and '2336330010201580004' union
select * from olsuser.cardmaster where cm_card_no between '2336330012402300018' and '2336330012402310009' union
select * from olsuser.cardmaster where cm_card_no between '2336330052400220016' and '2336330052400230007' union
select * from olsuser.cardmaster where cm_card_no between '2336330015103900012' and '2336330015138100032' union
select * from olsuser.cardmaster where cm_card_no between '2336330055100910018' and '2336330055100920009'
----- Call Stack Trace -----
calling                   call     entry
location                  type     point
--------------------      -------- --------------------
ksedmp()+220              CALL     ksedst()+0
kgeriv()+268              PTR_CALL 0000000000000000
kgesiv()+140              CALL     kgeriv()+0
kgesic1()+32              CALL     kgesiv()+0
kghfrf()+204              CALL     kgherror()+0
kkscls()+1592             CALL     kghfrf()+0
opicca()+248              CALL     kkscls()+0
opiclo()+8                CALL     opicca()+0
kpoclsa()+60              CALL     opiclo()+0
opiodr()+2540             PTR_CALL 0000000000000000
ttcpip()+5676             PTR_CALL 0000000000000000
opitsk()+2408             CALL     ttcpip()+0
opiino()+2080             CALL     opitsk()+0
opiodr()+2540             PTR_CALL 0000000000000000
opidrv()+1656             CALL     opiodr()+0
sou2o()+16                CALL     opidrv()+0
main()+172                CALL     sou2o()+0
_start()+380              CALL     main()+0
/*8.1.7中stack trace还附带着寄存器信息，但我们可读不懂:)  */

opicca->kkscls->kghfrf->kgherror(heap层报错)->kgesic1。问题主要发生在调用kghfrf函数的时候，《famous summary stack trace from Oracle Version 8.1.7.4.0 Bug Note》一文罗列了Oracle的一些stack summary,其中kghfrx函数的作用是"Free extent. This is called when a heap is unpinned to request that it"；可以猜测kghfrf函数是用来释放某种内存结构的。在MOS上输入"kghfrf 8.1.7.4"关键词，可以找到Note 291936.1:

ORA-00600 [17182] on Oracle 8.1.7.4.0 After a CTRL-C or Client Termination Applies to: Oracle Server - Enterprise Edition - Version: 8.1.7.4 This problem can occur on any platform. Checked for relevance on 06-Mar-2007

Oracle RDBMS Server Versions prior to 9i Symptoms 1. Intermittent heap corruptions errors like ORA-00600 [17182] are reported in the alert.log file. 2. There is no impact to the database other than the process which encounters the errors getting killed. 3. From the trace file generated for this ORA-00600 error, check if the top few functions are : kgherror kghfrf kkscls opicca Cause If the trace file shows that kkscls calls kghfrf, then it is related to: Bug 2281320 -- ORA-600[17182] POSSIBLE AFTER CTRL-C OR CLIENT DEATH Solution The problem is when we call kghfrf to free a chunk of memory, we expect that this chunk to have been allocated from the Heap Memory and hence have a valid header, although internally we have used Frame Memory managed chunk. As a result, kghfrf errors out with the "Bagic Magic Number" in the Memory Chunk header error message. If you are running Oracle 8174, encounter this ORA-00600 [17182], and the call stack indicates the following functions { kgherror kghfrf kkscls }, then download and apply Patch 2281320 from MetaLink. This issue has been fixed in Oracle Server 8.1.7.5 and later versions. Note 2281320.8 is not limited to dblinks and can occur during normal database operation as well.

该文档叙述描述在9i以前版本中可能因堆损坏而出现该ORA-00600 [17182]错误，该错误不会导致致命问题或数据库损坏，最坏的情况是遭遇该错误的服务进程被杀死。与该问题匹配的主要依据是stack trace为kgherror kghfrf kkscls opicca，同我们的实际情况一致。可以通过打上one-off patch 2281320或者升级到8.1.7.5来避免该内部错误的发生，当然也可以置之不理，显然它不会造成太大的麻烦。此外kghfrf函数用以释放内存chunk，Oracle development起初以为所有这些可能被释放的chunk都是从堆内存中分配而来，因此都该有一个有效的header;而实际上它们可能是以帧式内存管理的chunk。kghfrf因读取到这种chunk header中的错误幻数(Bagic Magic Number)而误入歧途了。

admin 2010-08-12

Hdr: 2281320 8.1.7.3.0 RDBMS 8.1.7.3.0 PRG INTERFACE PRODID-5 PORTID-87 ORA-600 Abstract: ORA-600[17182] POSSIBLE AFTER CTRL-C OR CLIENT DEATH PROBLEM: -------- Regularly an ora-600[17182] is generated. Checking on this kind of files is done automatically and DBA's are informed about this error and have to check it at once (even middle in the night). Except memory corruption there does not seem to be an impact towards the database. DIAGNOSTIC ANALYSIS: -------------------- Have checked the objects in question: - problem occurs on different tables with different queries - execution plan shows usage of bitmap index as well FTS scans No regular plan can be found in it. Setting of diagnostic event 10235 with level 4 seems to introduce ora-4030 so had to be put off. Patch for 2177050 has been installed but problem occurred before and after installation of this patch. According cust the same error occurred in 8.1.7.2.0 as well. WORKAROUND: ----------- Have not found any. RELATED BUGS: ------------- Have not found any REPRODUCIBILITY: ---------------- not reproducable at will TEST CASE: ---------- Not applicable STACK TRACE: ------------ *** 17:30:34.123 ksedmp: internal or fatal error ORA-600: internal error code, arguments: [17182], [1075716168], [], [], [], [] , [], [] Current SQL statement for this session: select * from rcv a where a.clne_seq in (select clne_seq from cdt_lne where val_ day = 0) and (a.dte_due + 1 )= (select dte_nxt_pay from its_per b where b.iper_seq = a.ip er_seq) ----- Call Stack Trace ----- *** 17:30:45.045 link and map addresses differ for /oracle/app/oracle/product/8.1.7/lib/libobk.so - 3ffbffe0000, 30000000000 calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedmp:1838[kse.c] ??? ksedst:2205[kse.c] 12071E6BC ? 0380003D8 ? 1401EB838 ? 100000018 ? 1214707E8 ? 0380003D8 ? ksfdmp:917[ksf.c] ??? ksedmp:1838[kse.c] 121470854 ? 00000431E ? 000000000 ? 000000000 ? 000000001 ? 11FFFCAB0 ? kgeriv:1451[kge.c] ??? ksfdmp:917[ksf.c] 000000000 ? 000000000 ? 000000001 ? 11FFFCAB0 ? 100000018 ? 121471014 ? kgesiv:1679[kge.c] JSR kgeriv:1451[kge.c] 121470C48 ? 0380003D8 ? 1401EEAF8 ? 11FFFCAB0 ? 100000018 ? 10000431E ? kgesic1:1558[kge.c] ??? kgesiv:1679[kge.c] 12145F8B4 ? 11FFFCAB0 ? 100000018 ? 000000000 ? 000000000 ? 038000000 ? kgherror:569[kgh.c] ??? kgesic1:1558[kge.c] 000000000 ? 000000008 ? 000000010 ? 1214677DC ? 0380003D8 ? 1401EEAF8 ? kghfrf:5102[kgh.c] ??? kgherror:569[kgh.c] 000000001 ? 1206E9D84 ? 1401F9E38 ? 038000000 ? 000000000 ? 000000024 ? kkscls:3728[kks.c] ??? kghfrf:5102[kgh.c] 120FDF020 ? 000000003 ? 1401F1638 ? 1401F9E38 ? 038004318 ? 000000000 ? opicca:145[opicca.c JSR kkscls:3728[kks.c] 120E202E0 ? 038000000 ? 000000001 ? 000000001 ? 120BB502C ? 11FFFD660 ? opiclo:79[opiclo.c] JSR opicca:145[opicca.c 120BB502C ? 11FFFD660 ? 000000003 ? 11FFFD660 ? 1206408D4 ? 038000000 ? ... SUPPORTING INFORMATION: ----------------------- 24 HOUR CONTACT INFORMATION FOR P1 BUGS: ---------------------------------------- DIAL-IN INFORMATION: -------------------- IMPACT DATE: ------------ Files will be uploaded to ess30 The ora-600[729] has been solved by installing patch for 2177050. The uploaded traces are erroring in kkscls -> kghfrf during closing of cursors. The corruption is in private memory, but the bad chunk does not appear in the session heap. Can you add the following to see if we can get closer to the cause: event="600 trace name heapdump level 5125" event="10501 trace name context forever, level 4109" Also can you upload the alert log extract and init.ora parameter settings. problem has re-occured after enabling above events, have uploaded the 17182.zip file the cust has provided. Cust provided a new tracefile containing the ora-600[17182] have uploaded files of customer: trace + alertfile: 17182_18apr2002.zip An ORA-3113 is seen from the DBLINK during execution of this statement: select * from pmm_rcp_pmm order by dte_sta_pmm_rcp_pmm This occurs in the stack: ksesec0 ksucin srsmr1 srsrel sorrelqb qersoRelease rwsrld qecrlssub opifch opiall0 kpoal8 Note: The local error is only occuring as the DB link signals an ORA-3113. This implies the remote end of the DB link is failing. You should find out if that is due to an unexpected process death and if so follow that up as a seperate issue. The issue here is that in 8i an ORA-3113 from a DB link at a particular time can cause a local ORA-600 [17182] error. Please indicate if you are likely to need an 8i fix for this OERI:17182 problem so I know what action to take next. Thanks cust has checked the object in question and it is a local table: New info : select * from dba_objects where object_name = 'PMM_RCP_PMM'; OWNER OBJECT_NAME SUBOBJECT_NAME OBJECT_ID DATA_OBJECT_ID ------ ------------ ------------------------------ ---------- -------------- OBJECT_TYPE CREATED LAST_DDL_ TIMESTAMP STATUS T G S ------------------ --------- --------- ------------------- ------- - - - PUBLIC PMM_RCP_PMM 39282 SYNONYM 06-MAR-00 06-MAR-00 2000-03-06:19:30:41 VALID N N N ISR PMM_RCP_PMM 39277 39277 TABLE 06-MAR-00 23-OCT-01 2000-03-06:19:29:53 VALID N N N Please (re-)check the tracefile for the database link. Ooops - diagnosis is the same - it is just that the ORA-3113 is from a dead client connection not a dead DB link. ie: The client going away at an inappropriate time exposes the same hole in the code. Please provide a backport for this problem for 8.1.7.3.0 Can a timeframe be given in which a fix can be expected?? Thanks Rediscovery Information : "If you get ORA-600[17182] after a ORA-3113, and cause for 3113 indicates following pattern in the error stack, then it could be this bug. ...opifch()->qecrlssub()->....ksesec0()" ]] ORA-600[17182] occurred followed by ORA-3113 when the heap dump ]] indicated that 17182 encountered while freeing the chunk marked with ]] "define-info".

admin 2010-08-12

Hdr: 2491757 8.1.7.4 RDBMS 8.1.7.4 PRG INTERFACE PRODID-5 PORTID-23 ORA-600 2281320 Abstract: ORA-600 [17182] [32227064], [], [], [], [], [], [] AND ORA-3113 ON 8.1.7.4 TAR: ---- SMS TAR 2396840.995 PROBLEM: -------- Customer is getting this problem and problematic sessionis disconnected. They are not able toreproduce this at will. DIAGNOSTIC ANALYSIS: -------------------- NA WORKAROUND: ----------- None RELATED BUGS: ------------- REPRODUCIBILITY: ---------------- Custome could not reproduce this at will. But we belive it will be reproduced in future TEST CASE: ---------- NA STACK TRACE: ------------ ksedmp kgeriv kgesiv kgesic1 kghfrf kkscls opicca opiclo opifcs ksuxds ksudel opidcl opidrv sou2o main _start SUPPORTING INFORMATION: ----------------------- 24 HOUR CONTACT INFORMATION FOR P1 BUGS: ---------------------------------------- NA DIAL-IN INFORMATION: -------------------- NA IMPACT DATE: ------------ Alert.log, init.ora and trace file of problem is on machine ess30in directory /bug/bug2491757 in file bug2491757.zip The current SQL statement is SELECT "OSN_POD_OSEBA"."SIFRA" FROM "OSN_POD"."OSEBA" "OSN_POD_OSEBA" ORDER BY "PRIIMEK" If this is over a DB link this is probably a duplicate of Bug.2281320 . Please confirm what "OSN_POD"."OSEBA" is. No, this is no db_link. I've checked that. Sorry - I should not have mentioned DB links. Bug 2281320 has nothing to do with DB links - Ive corrected its title. From your trace the dump is when freeing kxscdfn in the current instantiation. This is not pointing at a KGH chunk hence the OERI:17182. This is almost certainly a duplicate of bug:2281320 If the customer needs a fix in 8174 please request a PSE to 8174 referencing this bug as evidence. It is likely this is from a dead client or a client interrupt so unless this is happening a lot you would need a good business case for a PSE. There is no actual corruption here - just a cleanup error.

admin 2010-08-12

Hdr: 3421829 8.1.7.4 RDBMS 8.1.7.4 PRODID-5 PORTID-59 ORA-600 Abstract: ORA-600 15203 AND ORA-600 17182 PROBLEM: -------- Customer is getting intermittent ora-600 errors. The TAR was originally opened for the ora-600 17182 errors. However, since asking the customer to set event 10235 but before setting it, he has encountered ora-600 15203 which seems to have spawned more ora-600 17182 errors. He cannot reproduce the errors at will. However, they are quite frequent (occur on a daily basis) DIAGNOSTIC ANALYSIS: -------------------- I have asked the customer to set event 10235 level 2. However, he is concerned about a performance hit. He wanted to know if we could give him a % of performance degredation that he would encounter. I told him I would ask development. I had also asked him to set event 10501. However, I understand that this event causes more of a performance hit then the 10235. So, I don't think I can get him to set that event. The customer also wants to be sure that we will get all the information needed from setting the 10235 event. He does not want to have to set further events - causing downtime on production. WORKAROUND: ----------- none known RELATED BUGS: ------------- bug 2765055 OERI:15203 / Memory corruption if partitioned table cursor is reloaded -----this bug has to do with partitioned tables, my ct does not use partitioned tables. REPRODUCIBILITY: ---------------- intermittently on a daily basis TEST CASE: ---------- none available STACK TRACE: ------------ /opt/oracle/admin/MOVE/udump/ora_7255_move.trc =============================================== ORA-600: internal error code, arguments: [17182], [1075419096], [], [], [], [], [], [] Current SQL statement for this session: SELECT 'x' FROM task_master WHERE TASK_ID=2953359 FOR UPDATE NOWAIT STACK: kgherror kghfrf kxscln kkscls Chunk 401997d8 sz= 56 ERROR, BAD MAGIC NUMBER (3b) /opt/oracle/admin/MOVE/udump/ora_3761_move.trc ================================================ ORA-600: internal error code, arguments: [15203], [9], [5], [], [], [], [], [] Current SQL statement for this session: select inventory_id ,product_id ,uom_family ,uom_type_code ,product_key ,location_is_lp_ind ,physical_location_no ,onhand_quantity ,inbound_quantity ,outbound_quantity ,material_status_code ,material_keepers_ref ,inventory_type ,inventory_status from inventory where (location_no=:b0 and onhand_quantity>0) order by product_id asc STACK:ksesic2 kksfal

评论 (3)