SGA中的内存池包含不同大小的内存块。当数据库启动时,就有一个大的内存块分配并被hush buckets 里的空闲列表追踪。随着时间推移,随着内存的分配和释放,内存块被按照大小在不同的hush buckets间移动。当SGA里任何一个内存池里出现不能满足内部分配请求的情况时,ORA-04031就出现了。
shared pool共享池的管理方式不同于其它的内存池。。共享池存放与数据字典和library cache有关的信息。但是,这些内存区域根据空闲列表和最近使用算法(LRU)管理。当在共享池的所有搜索结束后,从LRU列表清除所有的可能清除的对象, 多次扫描空闲列表后,仍没有找到内存块,ORA-04031就出现了。这意味着ORA-04031很难预测。
对共享池的监测,可以看它是否包含许多类似的SQL,只有文字不同。 这种情况会占用更多的共享池内存并引共享池碎片,过多的共享池碎片(fragment)会导致虽然共享池中仍有大量的free memory,但都是尺寸较小的内存块(chunk),当Oracle进程申请一些较大的连续内存空间(memory chunk)时,虽然共享池中的free memory大小远大于申请的连续空间大小,仍会引发ORA-4031错误。使用绑定变量可以使SQL 共享。使用本文所附的脚本可以查出内存中是否有许多类似SQL。
即使使用了绑定变量后,仍然可能存在高version count(子指针)的情况。为了使子指针共享,CURSOR_SHARING参数可能需要调整。metalink 文档Note 296377.1 和 261020.1可以提供详细信息。若造成4031的原因是由于未绑定变量或者游标无法共享导致的过度硬解析(Hard Parse),则应当调整应用绑定变量或者调整初始化参数。
9i中开始引入shared pool subpool子池技术。设计多个共享池子池的目的是分散单个shared pool LRU Latch的并行压力。 ,每一个子池都包含自有的LRU LIST和保留区域等其他内存结构。 subpools子池的数量会在Oracle实例启动(startup nomount)时根据SGA_MAX_SIZE(或11g中的memory_max_target)以及服务器上的CPU数量而自动决定。子池数量最多为7个,在Card系统中为4个。
每一个subpool都是一个"迷你型"的共享池,其包括自有的Freelist、内存结构记录和LRU LIST。subpool子池技术是对shared pool并发扩张能力的增强,且每一个子池现在都使用独立的shared pool child latch来保护。这意味着不再像9i以前的版本那样因为只有一个shared pool latch而剧烈争用了。
但是在实际使用中发现版本9i中仍存在一些BUG,可能导致在子池之间的内存使用分布不平衡均匀,这可能导致虽然部分子池使用率不高,但是个别子池内存过度分配,从而导致ORA-4031错误。一般把这种现象称作"subpool imbalance"。
分析问题发生时段的ALERT.LOG内容如下:
Wed Nov 21 02:16:01 2012
ARC1: Evaluating archive log 3 thread 1 sequence 2373
ARC1: Beginning to archive log 3 thread 1 sequence 2373
Creating archive destination LOG_ARCHIVE_DEST_1: '/card_arch1/1_2373.arc'
ARC1: Completed archiving log 3 thread 1 sequence 2373
Wed Nov 21 16:11:46 2012
Errors in file /oracle/admin/card/udump/card1_ora_6782.trc:
ORA-04031: unable to allocate 4216 bytes of shared memory ("shared pool","select rowid, count(*) as TO...","sga heap(1,0)","library cache")
ORA-04031: unable to allocate 4216 bytes of shared memory ("shared pool","select rowid, count(*) as TO...","sga heap(1,0)","library cache")
Wed Nov 21 16:11:47 2012
Trace dumping is performing id=[cdmp_20121121161147]
Wed Nov 21 16:14:10 2012
Errors in file /oracle/admin/card/udump/card1_ora_6682.trc:
ORA-04031: unable to allocate 4216 bytes of shared memory ("shared pool","TD_REPTCASH","sga heap(1,0)","library cache")
ORA-04031: unable to allocate 4216 bytes of shared memory ("shared pool","TD_REPTCASH","sga heap(1,0)","library cache")
Wed Nov 21 16:14:11 2012
Trace dumping is performing id=[cdmp_20121121161411
在16:11:46时SPID=6782的前台服务进程报ORA-4031错误,分析该进程的TRACE如下:
/oracle/admin/card/udump/card1_ora_6782.trc
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.8.0 - Production
ORACLE_HOME = /oracle/product/9.2
System name: HP-UX
Node name: ecprod1
Release: B.11.31
Version: U
Machine: ia64
Instance name: card1
Redo thread mounted by this instance: 1
Oracle process number: 114
Unix process pid: 6782, image: oracle@ecprod1 (TNS V1-V3)
*** 2012-11-21 16:06:40.006
*** SESSION ID:(498.18090) 2012-11-21 16:06:40.005
=================================
Begin 4031 Diagnostic Information
=================================
The following information assists Oracle in diagnosing
causes of ORA-4031 errors. This trace may be disabled
by setting the init.ora parameter _4031_dump_bitvec = 0
======================================
Allocation Request Summary Information
======================================
Current information setting: 00654fff
Dump Interval=300 seconds SGA Heap Dump Interval=3600 seconds
Last Dump Time=11/21/2012 16:06:39
Allocation request for: kkslpkp - literal info.
Heap: c000000353cb25d0, size: 4200
******************************************************
HEAP DUMP heap name="sga heap" desc=c000000381602030
extent sz=0xfe0 alt=200 het=32767 rec=9 flg=-126 opc=0
parent=0000000000000000 owner=0000000000000000 nex=0000000000000000 xsz=0x1
******************************************************
HEAP DUMP heap name="library cache" desc=c000000353cb25d0
extent sz=0x348 alt=32767 het=32 rec=0 flg=2 opc=2
parent=c000000381602030 owner=c00000032f85db08 nex=0000000000000000 xsz=0x1
Subheap has 4712 bytes of memory allocated
====================
Process State Object
====================
----------------------------------------
SO: c0000002e93dac10, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
(process) Oracle pid=114, calls cur/top: c00000030275a888/c00000030275a888, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 111
last post received-location: kglpndl: post after freeing latch
last process to post me: c0000002e93dd1b0 137 0
last post sent: 1298803 0 16
last post sent-location: ksasnd
last process posted by me: c0000002e93bb2d0 1 6
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: c0000002ea3a4ff0
O/S info: user: oracle, term: UNKNOWN, ospid: 6782
OSD pid info: Unix process pid: 6782, image: oracle@ecprod1 (TNS V1-V3)
=========================
User Session State Object
=========================
----------------------------------------
SO: c0000002ea3fc930, type: 4, owner: c0000002e93dac10, flag: INIT/-/-/0x00
(session) trans: 0000000000000000, creator: c0000002e93dac10, flag: (100041) USR/- BSY/-/-/-/-/-
DID: 0001-0072-00161539, short-term DID: 0000-0000-00000000
txn branch: 0000000000000000
oct: 0, prv: 0, sql: c000000347a44a50, psql: c0000003009a42e0, user: 26/OC_MES
O/S info: user: root, term: , ospid: 1234, machine: ecrsouterapp1
program:
last wait for 'latch free' blocking sess=0x0 seq=33786 wait_time=12162
address=c00000030245cdc8, number=9d, tries=0
temporary object counter: 0
=========================
Current Parent KGL Object
=========================
LIBRARY OBJECT HANDLE: handle=c000000347a44a50
name=select rowid, count(:"SYS_B_0") amount from te_enterpriseinfo_cur a where a.borrowerid = :"SYS_B_1" and a.regionalismcode like :"SYS_B_2"
hash=5740b5 timestamp=11-21-2012 16:06:39
namespace=CRSR flags=RON/KGHP/TIM/PN0/MED/[50010000]
kkkk-dddd-llll=0000-0001-0001 lock=N pin=X latch#=9
lwt=c000000347a44a80[c000000347a44a80,c000000347a44a80] ltm=c000000347a44a90[c000000347a44a90,c000000347a44a90]
pwt=c000000347a44ab0[c000000347a44ab0,c000000347a44ab0] ptm=c000000347a44b40[c000000347a44b40,c000000347a44b40]
ref=c000000347a44a60[c000000347a44a60, c000000347a44a60] lnd=c000000347a44b58[c000000347a44b58,c000000347a44b58]
LIBRARY OBJECT: object=c00000037760ea28
type=CRSR flags=EXS[0001] pflags= [00] status=VALD load=0
CHILDREN: size=16
child# table reference handle
------ -------- --------- --------
0 c00000037760ec88 c00000035b9ba6a8 c000000353207a90
C000000000000004 ?
C000000000032F90 ?
----- End of Call Stack Trace -----
===============================
Memory Utilization of Subpool 1
===============================
Allocation Name Size
_________________________ __________
"free memory " 82400392
"miscellaneous " 43284312
"type object de " 0
"parameters " 0
"KCL name table " 3673416
"MTTR advisory " 98256
"Checkpoint queue " 9177280
"db_block_hash_buckets " 9699328
"PL/SQL MPCODE " 0
"trigger inform " 0
"sim memory hea " 1279200
"KCL extra lock elements " 864000
"PL/SQL PPCODE " 0
"ges reserved msg buffers " 2096008
"fixed allocation callback" 744
"pl/sql source " 0
"KQR M PO " 189272
"trigger source " 0
"ges enqueues " 3261024
"KSXR pending messages que" 853952
"gcs shadows " 12303112
"ges resource hash table " 1048576
"dictionary cache " 1068608
"ges resources " 1592696
"enqueue resources " 572504
"errors " 0
"PL/SQL DIANA " 0
"library cache " 439251408
"KQR L PO " 266936
"sql area " 64536
"sessions " 749008
"gcs resources " 20555384
"KQR S SO " 2048
"trigger defini " 0
"event statistics per sess" 3137568
"KGLS heap " 26608
"KQR X PO " 18032
"table definiti " 0
===============================
Memory Utilization of Subpool 2
===============================
Allocation Name Size
_________________________ __________
"free memory " 108187608
"miscellaneous " 37722824
"table definiti " 1984
"ges enqueues " 3395408
"db_block_hash_buckets " 9794640
"ksm_file2sga region " 370496
"PL/SQL DIANA " 175992
"KQR S SO " 512
"KQR L PO " 633360
"gcs shadows " 28408552
"sim memory hea " 1286432
"parameters " 206424
"1M buffer " 1049088
"errors " 40280
"type object de " 0
"KQR M PO " 804672
"trigger defini " 272
"trigger source " 88
"partitioning d " 22728
"channel handle " 260392
"KGK heap " 552
"db_files " 253952
"dictionary cache " 1068608
"KQR X PO " 23184
"gcs resources " 36658760
"transaction " 480384
"trigger inform " 64
"library cache " 159389832
"ges resources " 1482984
"sql area " 226277160
"sessions " 746304
"Checkpoint queue " 9177280
"event statistics per sess" 3148936
"fixed allocation callback" 720
"FileIdentificatonBlock " 3580496
"KGLS heap " 381736
"ktlbk state objects " 651240
"PL/SQL MPCODE " 1850264
===============================
Memory Utilization of Subpool 3
===============================
Allocation Name Size
_________________________ __________
"free memory " 108746832
"miscellaneous " 43190000
"KGLS heap " 340640
"gcs resource hash table " 16105280
"trigger source " 80
"joxs heap init " 4240
"PL/SQL PPCODE " 23624
"errors " 5592
"KQR L PO " 612576
"trigger inform " 64
"table definiti " 1560
"sim memory hea " 1279200
"MTTR advisory " 106368
"1M buffer " 1049088
"ges regular msg buffers " 1088248
"gcs resources " 20555384
"KQR M PO " 805872
"pl/sql source " 0
"KQR X PO " 59248
"KQR S SO " 2816
"ges enqueues " 3985496
"ges resources " 1917944
"dictionary cache " 1068608
"trigger defini " 9008
"type object de " 0
"PL/SQL DIANA " 410776
"parameters " 187768
"library cache " 197201256
"processes " 1376000
"sql area " 209882928
"sessions " 746304
"qmps connections " 486200
"event statistics per sess" 3137568
"Checkpoint queue " 9177280
"PLS non-lib hp " 2088
"gcs shadows " 12306208
"fixed allocation callback" 680
"PL/SQL MPCODE " 1661384
===============================
Memory Utilization of Subpool 4
===============================
Allocation Name Size
_________________________ __________
"free memory " 82782928
"miscellaneous " 40984472
"ges resources " 1783744
"db_handles " 1160000
"parameters " 323936
"sim memory hea " 1279200
"KGLS heap " 697784
"KQR M PO " 791440
"ges big msg buffers " 4674088
"temporary tabl " 0
"PL/SQL DIANA " 146680
"partitioning d " 67896
"KGK heap " 65584
"KQR L PO " 600312
"Checkpoint queue " 9177280
"PL/SQL MPCODE " 3006528
"enqueue " 2332272
"FileOpenBlock " 16104056
"KSXR receive buffers " 1034000
"table definiti " 3272
"KQR X PO " 46368
"ges enqueues " 4135704
"dictionary cache " 1068608
"gcs shadows " 12306208
"KQR S SO " 2304
"PX subheap " 54640
"trigger inform " 680
"library cache " 138469232
"gcs resources " 20555384
"sql area " 271924816
"sessions " 746304
"errors " 15304
"event statistics per sess" 3137568
"ges process array " 1273272
"pl/sql source " 0
"trigger defini " 4400
"type object de " 0
"fixed allocation callback" 728
LIBRARY CACHE STATISTICS:
namespace gets hit ratio pins hit ratio reloads invalids
-------------- --------- --------- --------- --------- ---------- ----------
CRSR 1044847223 0.999 1198493483 0.551 507345521 90137
TABL/PRCD/TYPE 1190948651 1.000 2350380701 1.000 125865 0
BODY/TYBD 568579 1.000 568659 0.999 71 0
TRGR 12494087 1.000 12494089 1.000 774 0
INDX 2091941 0.999 2091942 0.999 34 0
CLST 68101 0.998 83171 0.996 0 0
OBJE 0 0.000 0 0.000 0 0
PIPE 0 0.000 0 0.000 0 0
LOB 0 0.000 0 0.000 0 0
DIR 348 0.994 216 0.778 22 0
QUEU 0 0.000 0 0.000 0 0
OBJG 0 0.000 0 0.000 0 0
PROP 0 0.000 0 0.000 0 0
JVSC 502 0.578 502 0.578 0 0
JVRE 502 0.584 502 0.584 0 0
ROBJ 0 0.000 0 0.000 0 0
REIP 0 0.000 0 0.000 0 0
CPOB 0 0.000 0 0.000 0 0
EVNT 11919560 1.000 17875331 1.000 1680 0
SUMM 0 0.000 0 0.000 0 0
DIMN 0 0.000 0 0.000 0 0
CTX 0 0.000 0 0.000 0 0
OUTL 0 0.000 0 0.000 0 0
RULS 0 0.000 0 0.000 0 0
RMGR 0 0.000 0 0.000 0 0
IFSD 0 0.000 0 0.000 0 0
PPLN 0 0.000 0 0.000 0 0
PCLS 0 0.000 0 0.000 0 0
SUBS 0 0.000 0 0.000 0 0
LOCS 0 0.000 0 0.000 0 0
RMOB 0 0.000 0 0.000 0 0
RSMD 0 0.000 0 0.000 0 0
JVSD 0 0.000 0 0.000 0 0
ENPR 0 0.000 0 0.000 0 0
RELC 0 0.000 0 0.000 0 0
STREAM 0 0.000 0 0.000 0 0
APPLY 0 0.000 0 0.000 0 0
APPLY SOURCE 0 0.000 0 0.000 0 0
APPLY DESTN 0 0.000 0 0.000 0 0
TEST 0 0.000 0 0.000 0 0
CUMULATIVE 2262939494 0.999 3581988596 0.850 507473967 90137
当Oracle进程触发ORA-4031错误后会产生4031相关的TRACE,其中包括了当时简要的shared pool使用情况; 分析上述4031 trace中的信息可以发现存在4个subpool (Memory Utilization of Subpool 1、Memory Utilization of Subpool 2、Memory Utilization of Subpool 3、Memory Utilization of Subpool 4)。
以上4个subpool的主要使用情况如下:
|
Subpool1 |
Subpool2 |
Subpool3 |
Subpool4 |
Total Size |
| Free Memory |
82400392 |
108187608 |
108746832 |
82782928 |
364MB |
| library cache |
439251408 |
159389832 |
197201256 |
138469232 |
947MB |
| Sql area |
64536 |
226277160 |
209882928 |
271924816 |
675MB |
| Total size |
637531776 |
637531776 |
637531776 |
620754624 |
2415MB |
从以上表格可以看到library cache和SQL Area占用了1600MB的共享池空间,而实际free memory的总数也达到364MB,实际引发ORA-4031时所申请的chunk大小仅为4216 bytes。
进一步分析可以发现在subpool1中的sql area仅占用64536字节,而subpool 4中sql area占用271924816字节,说明subpool1与其他subpool之间在空间使用上存在不均衡。
为了进一步分析Card库的共享池使用情况,我们通过动态性能视图收集了11-23日 14时当前的shared pool使用状态,由于并非在ORA-4031当时所收集的信息,虽然不能作为诊断该问题的直接证据,但同样具有参考意义:
BUCKET KSMCHCLS subpool From Count Biggest AvgSize Total
-------------------------------- -------- ---------- ---------- ---------- ---------- ---------- ----------
0 (<140) free 1 40 1443 48 43 62112
0 (<140) free 1 50 373 56 56 20888
0 (<140) free 1 60 537 64 64 34368
0 (<140) free 1 70 364 72 72 26208
0 (<140) free 1 80 2778 88 86 241024
0 (<140) free 1 90 206 96 96 19776
0 (<140) free 1 100 230 104 104 23920
0 (<140) free 1 110 602 112 112 67424
0 (<140) free 1 120 706 128 124 88112
0 (<140) free 1 130 199 136 136 27064
0 (<140) free 2 40 1031 48 43 45232
0 (<140) free 2 50 766 56 56 42896
0 (<140) free 2 60 616 64 64 39424
0 (<140) free 2 70 374 72 72 26928
0 (<140) free 2 80 3421 88 86 295744
0 (<140) free 2 90 338 96 96 32448
0 (<140) free 2 100 451 104 104 46904
0 (<140) free 2 110 325 112 112 36400
0 (<140) free 2 120 1458 128 121 176832
0 (<140) free 2 130 233 136 136 31688
0 (<140) free 3 40 1604 48 44 70896
0 (<140) free 3 50 640 56 56 35840
0 (<140) free 3 60 694 64 64 44416
0 (<140) free 3 70 680 72 72 48960
0 (<140) free 3 80 3006 88 85 258256
0 (<140) free 3 90 615 96 96 59040
0 (<140) free 3 100 602 104 104 62608
0 (<140) free 3 110 487 112 112 54544
0 (<140) free 3 120 1298 128 123 160360
0 (<140) free 3 130 340 136 136 46240
0 (<140) free 4 40 1110 48 43 48424
0 (<140) free 4 50 758 56 56 42448
0 (<140) free 4 60 557 64 64 35648
0 (<140) free 4 70 451 72 72 32472
0 (<140) free 4 80 2678 88 85 230072
0 (<140) free 4 90 305 96 96 29280
0 (<140) free 4 100 333 104 104 34632
0 (<140) free 4 110 303 112 112 33936
0 (<140) free 4 120 1353 128 121 164832
0 (<140) free 4 130 243 136 136 33048
1 (140-267) free 1 200 1001 216 204 205048
1 (140-267) free 1 140 378 152 148 55968
1 (140-267) free 1 160 568 176 164 93208
1 (140-267) free 1 180 178 192 188 33608
1 (140-267) free 1 220 1536 232 231 355320
1 (140-267) free 1 240 449 256 249 112048
1 (140-267) free 1 260 135 264 264 35640
1 (140-267) free 2 200 1028 216 206 212768
1 (140-267) free 2 140 774 152 148 114560
1 (140-267) free 2 160 1293 176 171 221352
1 (140-267) free 2 180 576 192 189 109144
1 (140-267) free 2 220 377 232 228 86184
1 (140-267) free 2 240 921 256 250 230856
1 (140-267) free 2 260 272 264 264 71808
1 (140-267) free 3 200 1078 216 205 221824
1 (140-267) free 3 140 874 152 148 129448
1 (140-267) free 3 160 1642 176 168 276560
1 (140-267) free 3 180 2612 192 185 483816
1 (140-267) free 3 220 443 232 227 100824
1 (140-267) free 3 240 858 256 248 213608
1 (140-267) free 3 260 193 264 264 50952
1 (140-267) free 4 200 741 216 206 152928
1 (140-267) free 4 140 546 152 148 80816
1 (140-267) free 4 160 1164 176 169 197160
1 (140-267) free 4 180 408 192 187 76632
1 (140-267) free 4 220 304 232 227 69216
1 (140-267) free 4 240 578 256 249 144200
1 (140-267) free 4 260 172 264 264 45408
2 (268-523) free 1 300 494 336 320 158368
2 (268-523) free 1 250 450 296 284 127960
2 (268-523) free 2 300 719 336 320 230744
2 (268-523) free 2 500 20 520 518 10360
2 (268-523) free 2 250 655 296 283 185744
2 (268-523) free 2 450 51 496 480 24488
2 (268-523) free 3 300 956 336 322 308272
2 (268-523) free 3 250 641 296 281 180688
2 (268-523) free 4 300 636 336 320 203920
2 (268-523) free 4 400 46 424 409 18840
2 (268-523) free 4 250 582 296 283 165232
2 (268-523) free 4 350 23 392 383 8824
3-5 (524-4107) free 1 500 1 776 776 776
3-5 (524-4107) free 1 3000 1 3160 3160 3160
3-5 (524-4107) free 2 500 306 984 700 214312
3-5 (524-4107) free 2 1000 123 1496 1201 147760
3-5 (524-4107) free 2 1500 1139 1992 1878 2139208
3-5 (524-4107) free 2 2000 1444 2496 2322 3353672
3-5 (524-4107) free 2 2500 4534 2992 2751 12476912
3-5 (524-4107) free 2 3000 7520 3496 3211 24151448
3-5 (524-4107) free 2 3500 6105 3992 3770 23018216
3-5 (524-4107) free 2 4000 2403 4104 4065 9769384
3-5 (524-4107) free 3 500 652 992 858 559536
3-5 (524-4107) free 3 1000 1897 1496 1162 2204544
3-5 (524-4107) free 3 1500 289 1992 1726 498840
3-5 (524-4107) free 3 2000 2276 2496 2260 5144800
3-5 (524-4107) free 3 2500 4021 2992 2789 11217152
3-5 (524-4107) free 3 3000 1822 3496 3279 5975424
3-5 (524-4107) free 3 3500 11485 3992 3785 43475232
3-5 (524-4107) free 3 4000 2710 4104 4048 10970720
3-5 (524-4107) free 4 500 44 992 966 42512
3-5 (524-4107) free 4 1000 238 1488 1177 280248
3-5 (524-4107) free 4 1500 152 1968 1765 268392
3-5 (524-4107) free 4 2000 2015 2496 2376 4789544
3-5 (524-4107) free 4 2500 3400 2992 2754 9364976
3-5 (524-4107) free 4 3000 2037 3496 3287 6696072
3-5 (524-4107) free 4 3500 4731 3992 3710 17552504
3-5 (524-4107) free 4 4000 2043 4104 4073 8321968
6+ (4108+) free 1 4000 8 4712 4250 34000
6+ (4108+) free 1 5000 1 5912 5912 5912
6+ (4108+) free 1 7000 1 7048 7048 7048
6+ (4108+) free 1 13000 1 13416 13416 13416
6+ (4108+) free 1 15000 1 15752 15752 15752
6+ (4108+) free 1 19000 1 19600 19600 19600
6+ (4108+) free 1 21000 1 21360 21360 21360
6+ (4108+) free 1 25000 1 25232 25232 25232
6+ (4108+) free 2 4000 100 4272 4146 414648
6+ (4108+) free 2 8000 4 8456 8338 33352
6+ (4108+) free 2 11000 1 11872 11872 11872
6+ (4108+) free 3 4000 145 4720 4132 599248
6+ (4108+) free 3 8000 4 8664 8432 33728
6+ (4108+) free 4 4000 90 4528 4152 373728
6+ (4108+) free 4 6000 1 6896 6896 6896
6+ (4108+) free 4 7000 3 7848 7626 22880
6+ (4108+) free 4 8000 3 8456 8360 25080
以上列出了各subpool freelist bucket的情况,6+ (4108+) 代表存放大于4108bytes的free chunk的hash bucket, 可以看到大于4108bytes的free chunk总大小越为1.6M, 而3-5 (524-4107) bucket中chunk数量最多、总大小也最大。 由此可知card库共享池存在大量碎片。过多的共享池碎片导致连续尺寸大约4108字节的free chunk越来越少,当16:11时需要分配4216字节连续chunk时,由于无法分配该连续空间而触发了ORA-4031错误。
综上所述是由于共享池碎片和subpool内存分配不均衡最终引发了此次ORA-4031 。
18:33时触发过ORA-4031的Oracle进程陆续报ORA-00600: [729], [8680], [space leak]错误,该错误的stack call 如下 ksudel=> ksuxds=> ksmugf=> cold_ksmuhe
*** 2012-11-21 18:33:35.807
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [729], [8680], [space leak], [], [], [], [], []
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+528 call _etext_f()+23058430 000000000 ?
09110304648 C000000000000996 ?
400000000285CD00 ?
ksfdmp()+64 call _etext_f()+23058430 000000003 ?
09110304648
kgeriv()+432 call _etext_f()+23058430 600000000004F280 ?
09110304648 000000003 ?
C000000000000716 ?
40000000052FF030 ?
00000863F ?
60000000004B5788 ?
000000000 ? 000000000 ?
kgesiv()+208 call _etext_f()+23058430 600000000004F280 ?
09110304648 60000000004B7A08 ?
60000000000502F8 ?
600000000001D800 ?
600000000001D790 ?
ksesic2()+208 call _etext_f()+23058430 600000000004F280 ?
09110304648 60000000005E5198 ?
0000002D9 ? 000000002 ?
9FFFFFFFFFFF9418 ?
$cold_ksmuhe()+432 call _etext_f()+23058430 0000002D9 ?
09110304648 60000000004C1680 ?
9FFFFFFFFFFF9418 ?
60000000004BFFD0 ?
00000000A ?
4000000000874260 ?
60000000005E1780 ?
000000000 ?
ksmugf()+688 call _etext_f()+23058430 0000021E8 ?
09110304648 600000000004F060 ?
60000000005E1778 ?
000000001 ?
C000000000000B9C ?
ksuxds()+4320 call _etext_f()+23058430 4000000000C9B4B8 ?
09110304648 C0000002EB3B47B8 ?
400000000086A4C0 ?
C000000000001736 ?
4000000001EC4B60 ?
000008E73 ?
9FFFFFFFFFFF9480 ?
60000000004ABC40 ?
ksudel()+144 call 9fffffffffff94f8 C0000002EB3B4320 ?
通过上述stack call可以定位到Bug 3931332 : ORA-600 [729] LEAKS WITH ORA-4031 TRACES:
Hdr: 3931332 9.2.0.5.0 RDBMS 9.2.0.5.0 DICTIONARY PRODID-5 PORTID-46 ORA-600
Abstract: ORA-600 [729] LEAKS WITH ORA-4031 TRACES
sedmp: internal or fatal error
ORA-600: internal error code, arguments: [729], [4380], [space leak], [],
[], [], [], []
----- Call Stack Trace -----
ksmuhe ksmugf ksuxds ksudel
以上ORA-600 [729] LEAKS内部错误常由ORA-4031引发,一般通过避免ORA-4031可以绕过ORA-600错误
问题总结:
- 由于共享池碎片和subpool内存分配不均衡最终引发了此次ORA-4031,解决方案见下文建议部分
- ORA-600 [729] LEAKS内部错误由ORA-4031触发BUG 3931332引起,通过避免ORA-4031错误可以绕过该问题
- 执行statspack长时间无响应并消耗100%的CPU的问题在11-23日实际操作中未出现,由于没有当时的性能记录无法进一步追查。
1.1 建议
- 定期监控shared pool free memory空闲内存量和碎片情况,具体可以使用如下脚本:
set pages 1000 lines 120
col name for a60
col value for a30
select * from v$sgastat where pool like 'shared%' and name='free memory';
select '0 (<140)' BUCKET, KSMCHCLS, KSMCHIDX, 10*trunc(KSMCHSIZ/10) "From",
count(*) "Count" , max(KSMCHSIZ) "Biggest",
trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total"
from x$ksmsp
where KSMCHSIZ<140
and KSMCHCLS='free'
group by KSMCHCLS, KSMCHIDX, 10*trunc(KSMCHSIZ/10)
UNION ALL
select '1 (140-267)' BUCKET, KSMCHCLS, KSMCHIDX,20*trunc(KSMCHSIZ/20) ,
count(*) , max(KSMCHSIZ) ,
trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total"
from x$ksmsp
where KSMCHSIZ between 140 and 267
and KSMCHCLS='free'
group by KSMCHCLS, KSMCHIDX, 20*trunc(KSMCHSIZ/20)
UNION ALL
select '2 (268-523)' BUCKET, KSMCHCLS, KSMCHIDX, 50*trunc(KSMCHSIZ/50) ,
count(*) , max(KSMCHSIZ) ,
trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total"
from x$ksmsp
where KSMCHSIZ between 268 and 523
and KSMCHCLS='free'
group by KSMCHCLS, KSMCHIDX, 50*trunc(KSMCHSIZ/50)
UNION ALL
select '3-5 (524-4107)' BUCKET, KSMCHCLS, KSMCHIDX, 500*trunc(KSMCHSIZ/500) ,
count(*) , max(KSMCHSIZ) ,
trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total"
from x$ksmsp
where KSMCHSIZ between 524 and 4107
and KSMCHCLS='free'
group by KSMCHCLS, KSMCHIDX, 500*trunc(KSMCHSIZ/500)
UNION ALL
select '6+ (4108+)' BUCKET, KSMCHCLS, KSMCHIDX, 1000*trunc(KSMCHSIZ/1000) ,
count(*) , max(KSMCHSIZ) ,
trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total"
from x$ksmsp
where KSMCHSIZ >= 4108
and KSMCHCLS='free'
group by KSMCHCLS, KSMCHIDX, 1000*trunc(KSMCHSIZ/1000);
SELECT KSMCHCLS CLASS, COUNT(KSMCHCLS) NUM, SUM(KSMCHSIZ) SIZ,
To_char( ((SUM(KSMCHSIZ)/COUNT(KSMCHCLS)/1024)),'999,999.00')||'k' "AVG SIZE"
FROM X$KSMSP GROUP BY KSMCHCLS;
- 当shared pool 空闲内存少于80M或碎片过多时考虑alter system flush shared_pool强制冲刷共享池。若条件允许也可以考虑定期在业务空闲时段例如凌晨冲刷共享池,或者定期重启数据库实例
- 启用statspack自动收集性能快照,一般Interval为一个小时,statspack是久经考验的性能监控工具,对性能的损耗非常小。
- 考虑部署OSW性能监控工具:
OSWatcher(OS Watcher Black Box)是Oracle标准的Support工具,该工具通过一系列的SHELL脚本来收集多方面的OS信息,包括CPU、内存、网络等。
OSWatcher不会给系统带来较重的性能负载,OSWatcher本质上是调用最常见的OS命令如ps、vmstat、netstat。
通过大量广泛的OSW使用,已经证明了OSW是一套安全、可靠的资源监控工具。
更多OSW的信息可以参考以下Metalink文档:
OSW 用户手册:
OSWatcher Black Box User Guide.
What Is The OSWATCHER Black Box Effect On The Server Performance And What Are The Alternative Commands? [ID 946107.1]
中文:OSWatcher Black Box: How to improve performance and monitor your system automatically (Mandarin) [ID 1492202.1]
OSWatcher Black Box Analyzer User Guide [ID 461053.1]