March 21, 2010

对Oracle中索引叶块分裂而引起延迟情况的测试和分析

作者 · 阅读 19 分钟

在版本10.2.0.4未打上相关one-off补丁的情况下，分别对ASSM和MSSM管理模式表空间进行索引分裂测试，经过测试的结论如下：

l 在10gr2版本中MSSM方式是不能避免索引分裂引起交易超时问题；

l 10.2.0.4上的one-off补丁因为目前仅存在Linux版本，可以考虑声请补丁后具体测试（因目前没有补丁所以处于未知状态）。

l 合并索引是目前最具可行性的解决方案(alter index coalesce)。

l 最新的11gr2中经测试仍存在该问题。

具体测试过程如下：

1．自动段管理模式下的索引块分裂

SQL> drop tablespace idx1 including contents and datafiles;

Tablespace dropped.

SQL> create tablespace idx1 datafile ‘?/dbs/idx1.dbf’ size 500M

2 segment space management AUTO

3 extent management local uniform size 10M;

—创建自动段管理的表空间

Tablespace created.

SQL> create table idx1(a number) tablespace idx1;

Table created.

create index idx1_idx on idx1 (a) tablespace idx1 pctfree 0;

Index created. — 创建实验对象表及索引

SQL> insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000; — 插入25万条记录

250000 rows created.

SQL> commit;

Commit complete.

SQL>create table idx2 tablespace idx1 as select * from idx1 where 1=2;

Table created.

insert into idx2

select * from idx1 where rowid in

(select rid from

(select rid, rownum rn from

(select rowid rid from idx1 where a between 10127 and 243625 order by a) —取出后端部分记录,即每250条取一条

)

where mod(rn, 250) = 0

)

933 rows created.

SQL> commit;

Commit complete.

SQL> analyze index idx1_idx validate structure; —分析原索引

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280 499 0 — 未删除情况下499个叶块

SQL> delete from idx1 where a between 10127 and 243625; — 大量删除

commit;

233499 rows deleted.

SQL> SQL>

Commit complete.

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280 499 233499 — 删除后叶块数量不变

SQL> insert into idx1 select * from idx2; — 令那些empty块不再empty，但每个块中只有一到二条记录，空闲率仍为75-100%

commit;

933 rows created.

Commit complete.

SQL> insert into idx1 select 250000+rownum from all_objects where rownum <= 126; — 造成leaf块分裂前提

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’ and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

997 leaf node splits

997 leaf node 90-10 splits

0 branch node splits

0 queue splits —找出当前会话目前的叶块分裂次数

SQL>insert into idx1 values (251000); — 此处确实叶块分裂

1 row created.

SQL> commit;

Commit complete.

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’ and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

998 leaf node splits

998 leaf node 90-10 splits

0 branch node splits

0 queue splits — 可以看到对比之前的查询多了一个叶块分裂

SQL> set linesize 200 pagesize 1500;

SQL> select executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2 where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

——————————————————————————————————————————————————————————————————–

1 1603 0 271601 271601 933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1 156 0 82803 82803 126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1 177 0 3728 3728 1

insert into idx1 values (251000) — 读了那些实际不空的块，较多buffer_get

1 1409 0 40293 40293 933

insert into idx1 select * from idx2

1 240842 0 3478341 3478341 250000

SQL> insert into idx1 values (251001); — 不分裂的插入

1 row created.

SQL> commit;

Commit complete.

SQL> select executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2 where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

1 1603 0 271601 271601 933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1 156 0 82803 82803 126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1 9 0 1640 1640 1

insert into idx1 values (251001) —不分裂的插入，少量buffer_gets

1 177 0 3728 3728 1

insert into idx1 values (251000)

1 1409 0 40293 40293 933

insert into idx1 select * from idx2

1 240842 0 3478341 3478341 250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

如演示1所示，在自动段管理模式下大量删除后插入造成许多块为75%-100%空闲率且不完全为空，此后叶块分裂时将引起插入操作的相关前台进程扫描大量“空块“，若这些块不在内存中（引发物理读）且可能需要延迟块清除等原因时，减缓了该扫描操作的速度，造成叶块分裂缓慢，最终导致了其他insert操作被split操作所阻塞，出现enq:tx index contention等待事件。

2．手动段管理模式下的索引块分裂

SQL> drop tablespace idx1 including contents and datafiles;

Tablespace dropped.

SQL> create tablespace idx1 datafile ‘?/dbs/idx1.dbf’ size 500M

2 segment space management MANUAL — MSSM的情况

3 extent management local uniform size 10M;

Tablespace created.

SQL> create table idx1(a number) tablespace idx1;

create index idx1_idx on idx1 (a) tablespace idx1 pctfree 0;

Table created.

SQL> SQL> insert into idx1 select rownum from all_objects, all_objects where rownum <= 250

Index created.

SQL> SQL> 000;

commit;

create table idx2 tablespace idx1 as select * from idx1 where 1=2;

insert into idx2

select * from idx1 where rowid in

(select rid from

(select rid, rownum rn from

(select rowid rid from idx1 where a between 10127 and 243625 order by a)

)

where mod(rn, 250) = 0

)

commit;

250000 rows created.

SQL> SQL>

Commit complete.

SQL> SQL>

Table created.

SQL> SQL> 2 3 4 5 6 7 8 9

933 rows created.

SQL> SQL>

Commit complete.

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280 499 0

SQL> delete from idx1 where a between 10127 and 243625;

233499 rows deleted.

SQL> commit;

Commit complete.

SQL> insert into idx1 select * from idx2;

commit;

933 rows created.

SQL> SQL>

Commit complete.

SQL> SQL> insert into idx1 select 250000+rownum from all_objects where rownum <= 126;

commit;

126 rows created.

SQL> SQL>

Commit complete.

SQL>

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’ and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

1496 leaf node splits

1496 leaf node 90-10 splits

0 branch node splits

0 queue splits

SQL> insert into idx1 values (251000); — 确实分裂

1 row created.

SQL> commit;

Commit complete.

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’ and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

1497 leaf node splits

1497 leaf node 90-10 splits

0 branch node splits

0 queue splits

— 以上与ASSM时完全一致

SQL> select executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2 where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

1 1553 0 283301 283301 933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1 153 0 78465 78465 126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1 963 0 10422 10422 1 — 比ASSM模式下更大量的“空块”读

insert into idx1 values (251000)

1 984 0 35615 35615 933

insert into idx1 select * from idx2

1 238579 0 3468326 3469984 250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

SQL> insert into idx1 values (251001);

1 row created.

SQL> commit;

Commit complete.

SQL> select executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2 where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

1 1553 0 283301 283301 933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1 153 0 78465 78465 126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1 7 0 1476 1476 1

insert into idx1 values (251001) —不分裂的情况与ASSM时一致

1 963 0 10422 10422 1

insert into idx1 values (251000)

1 984 0 35615 35615 933

insert into idx1 select * from idx2

1 238579 0 3468326 3469984 250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

6 rows selected.

如演示2所示，MSSM情况下叶块分裂读取了比ASSM模式下更多的“空块“；MSSM并不能解决大量删除后叶块分裂需要扫描大量非空块的问题，实际上可能更糟糕。从理论上讲MSSM的freelist只能指出那些未达到pctfree和曾经到达pctfree后来删除记录后使用空间下降到pctused的块（doc：A free list is a list of free data blocks that usually includes blocks existing in a number of different extents within the segment. Free lists are composed of blocks in which free space has not yet reached PCTFREE or used space has shrunk below PCTUSED.），换而言之MSSM模式下”空块“会更多。

3．自动段管理模式下coalesce后的索引块分裂

SQL> drop tablespace idx1 including contents and datafiles;

Tablespace dropped.

SQL> create tablespace idx1 datafile ‘?/dbs/idx1.dbf’ size 500M

2 segment space management AUTO — ASSM 下 coalesce情况

3 extent management local uniform size 10M;

Tablespace created.

SQL> create table idx1(a number) tablespace idx1;

create index idx1_idx on idx1 (a) tablespace idx1 pctfree 0;

Table created.

SQL> SQL>

Index created.

SQL> SQL> insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000;

commit;

create table idx2 tablespace idx1 as select * from idx1 where 1=2;

insert into idx2

select * from idx1 where rowid in

(select rid from

(select rid, rownum rn from

(select rowid rid from idx1 where a between 10127 and 243625 order by a)

)

where mod(rn, 250) = 0

)

commit;

250000 rows created.

SQL> SQL>

Commit complete.

SQL> SQL>

Table created.

SQL> SQL> 2 3 4 5 6 7 8 9

933 rows created.

SQL> SQL>

Commit complete.

SQL> SQL> SQL>

SQL>

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280 499 0

SQL> delete from idx1 where a between 10127 and 243625;

commit;

233499 rows deleted.

SQL> SQL>

Commit complete.

SQL> alter index idx1_idx coalesce;

Index altered.

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280 33 0 — coalesc后 lf块合并了

SQL> insert into idx1 select * from idx2;

933 rows created.

SQL> SQL> commit;

Commit complete.

SQL>

SQL> insert into idx1 select 250000+rownum from all_objects where rownum <= 126;

commit;

126 rows created.

SQL> SQL>

Commit complete.

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’ and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

1999 leaf node splits

1995 leaf node 90-10 splits

0 branch node splits

0 queue splits

SQL> insert into idx1 values (251000); — 确实分裂

1 row created.

SQL> commit;

Commit complete.

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’ and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

2000 leaf node splits

1996 leaf node 90-10 splits

0 branch node splits

0 queue splits

SQL> select executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2 where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

1 1603 0 268924 268924 933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1 156 0 78349 78349 126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1 23 0 2218 2218 1 —少量buffer gets

insert into idx1 values (251000)

1 191 0 15596 15596 933

insert into idx1 select * from idx2

1 240852 0 3206130 3206130 250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

SQL> insert into idx1 values (251001);

1 row created.

SQL> commit;

Commit complete.

SQL> select executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2 where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

1 1603 0 268924 268924 933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1 156 0 78349 78349 126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1 9 0 1574 1574 1

insert into idx1 values (251001)

1 23 0 2218 2218 1

insert into idx1 values (251000)

1 191 0 15596 15596 933

insert into idx1 select * from idx2

1 240852 0 3206130 3206130 250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

6 rows selected.

如演示三所示在删除后进行coalesce操作，合并操作将大量空块分离出了索引结构(move empty out of index structure)，之后的叶块分裂仅读取了少量必要的块。

4．手动段管理模式下coalesce后的索引块分裂

SQL> drop tablespace idx1 including contents and datafiles;

Tablespace dropped.

SQL> create tablespace idx1 datafile ‘?/dbs/idx1.dbf’ size 500M

2 segment space management MANUAL — mssm情况下 coalesce

3 extent management local uniform size 10M;

Tablespace created.

SQL> create table idx1(a number) tablespace idx1;

create index idx1_idx on idx1 (a) tablespace idx1 pctfree 0;

Table created.

SQL> SQL> insert into idx1 select rownum from all_objects, all_objects where rownum <= 250

Index created.

SQL> SQL> 000;

commit;

create table idx2 tablespace idx1 as select * from idx1 where 1=2;

insert into idx2

select * from idx1 where rowid in

(select rid from

(select rid, rownum rn from

(select rowid rid from idx1 where a between 10127 and 243625 order by a)

)

where mod(rn, 250) = 0

)

commit;

250000 rows created.

SQL> SQL>

Commit complete.

SQL> SQL>

Table created.

SQL> SQL> 2 3 4 5 6 7 8 9

933 rows created.

SQL> SQL>

Commit complete.

SQL> SQL> SQL>

SQL>

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280 499 0

SQL> delete from idx1 where a between 10127 and 243625;

commit;

233499 rows deleted.

SQL> SQL>

Commit complete.

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280 499 233499

SQL> alter index idx1_idx coalesce;

Index altered.

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280 33 0

SQL> insert into idx1 select * from idx2;

933 rows created.

SQL> SQL> commit;

Commit complete.

SQL>

SQL> insert into idx1 select 250000+rownum from all_objects where rownum <= 126;

commit;

126 rows created.

SQL> SQL>

Commit complete.

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’ and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

2502 leaf node splits

2494 leaf node 90-10 splits

0 branch node splits

0 queue splits

SQL> insert into idx1 values (251000); — 确实分裂

1 row created.

SQL> commit;

Commit complete.

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’ and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

2503 leaf node splits

2495 leaf node 90-10 splits

0 branch node splits

0 queue splits

SQL> select executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2 where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

1 1553 0 281059 281059 933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1 153 0 77817 77817 126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1 19 0 2010 2010 1 — 少量buffer get

insert into idx1 values (251000)

1 126 0 15364 15364 933

insert into idx1 select * from idx2

1 238644 0 3229737 3230569 250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

SQL> insert into idx1 values (251001);

1 row created.

SQL> commit;

Commit complete.

SQL> select executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2 where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

1 1553 0 281059 281059 933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1 153 0 77817 77817 126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1 7 0 1460 1460 1

insert into idx1 values (251001)

1 19 0 2010 2010 1

insert into idx1 values (251000)

1 126 0 15364 15364 933

insert into idx1 select * from idx2

1 238644 0 3229737 3230569 250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

6 rows selected.

如演示4所示，MSSM模式下合并操作与ASSM情况下大致一样，合并操作可以有效解决该问题。

5． Coalesce合并操作的锁影响

SQL> create table coal (t1 int);

Table created.

SQL> create index pk_t1 on coal(t1);

Index created.

SQL> begin

2 for i in 1..3000 loop

3 insert into coal values(i);

4 commit;

5 end loop;

6 end;

7 /

PL/SQL procedure successfully completed.

SQL> delete coal where t1>500;

2500 rows deleted.

SQL> commit;

Commit complete.

SQL> analyze index pk_t1 validate structure;

Index analyzed. — 注意analyze validate操作会block一切dml操作

SQL> select blocks,lf_blks,del_lf_rows from index_stats;

BLOCKS LF_BLKS DEL_LF_ROWS

———- ———- ———–

8 6 2500 — 删除后的状态

此时另开一个会话，开始dml操作：

SQL> update coal set t1=t1+1;

500 rows updated.

— 回到原会话

SQL> alter index pk_T1 coalesce; — coalesce 未被阻塞

Index altered.

— 在另一个会话中commit,以便执行validate structure

SQL> analyze index pk_t1 validate structure;

Index analyzed.

SQL> select blocks,lf_blks,del_lf_rows from index_stats;

BLOCKS LF_BLKS DEL_LF_ROWS

———- ———- ———–

8 3 500

— 显然coalesce的操作没有涉及有dml操作的块

在没有dml操作的情况下：

SQL> truncate table coal;

Table truncated.

SQL> begin

2 for i in 1..3000 loop

3 insert into coal values(i);

4 commit;

5 end loop;

6 end;

7 /

PL/SQL procedure successfully completed.

SQL> analyze index pk_t1 validate structure;

Index analyzed.

SQL> select blocks,lf_blks,del_lf_rows from index_stats;

BLOCKS LF_BLKS DEL_LF_ROWS

———- ———- ———–

8 6 0

SQL> delete coal where t1>500;

2500 rows deleted.

SQL> commit;

Commit complete.

SQL> analyze index pk_t1 validate structure;

Index analyzed.

SQL> select blocks,lf_blks,del_lf_rows from index_stats;

BLOCKS LF_BLKS DEL_LF_ROWS

———- ———- ———–

8 6 2500

SQL> alter index pk_t1 coalesce;

Index altered.

SQL> analyze index pk_t1 validate structure;

Index analyzed.

SQL> select blocks,lf_blks,del_lf_rows from index_stats;

BLOCKS LF_BLKS DEL_LF_ROWS

———- ———- ———–

8 1 0

—没有dml时，coalesce 操作涉及了所有块

如演示5所示coalesce会避开dml操作涉及的块，但在coalesec的短暂间歇出现在索引上有事务的块不会太多。且coalesce操作不会降低索引高度。

附件是关于rebuild及coalesce索引操作的详细描述：

6． Coalesce操作总结

优点：

l 是一种快速的操作，对整体性能影响最小（not performance sensitive）。

l 不会锁表，绕过有事务的索引块。

l 可以有效解决现有的问题。

l 不会降低索引高度，引起再次的root split

缺点：

l 需要针对个别对象，定期执行合并操作；无法一劳永逸地全局地解决该问题。

7． Linux 10.2.0.4上相关补丁的技术交流

Metalink bug 8286901 note中叙述了一位用户遇到相同的问题并提交了SR,当时oracle support给出了one-off补丁，但该用户在apply了该补丁后仍未解决问题。

以下为note 原文：

It is similar to bug8286901, but after applied patch8286901, still see enq tx
contentiona with high “failed probes on index block reclamation”

Issue encountered by customer and Oracle developer (Stefan Pommerenk).

He describes is thus:

"Space search performed by the index splitter can't find space in neighboring

blocks, and then instead of allocating new space, we go and continue to

search for space elsewhere, which manifests itself in block reads from disk,

block cleanouts, and subsequent blocks written due to aggressive MTTR

setting."

"To clarify: the cleanouts are not the problem per se. The culprit seems to

be that the space search performed by the index splitter can't find space in

neighboring blocks, and then instead of allocating new space, we go and

continue to search for space elsewhere, which manifests itself in block reads

from disk, block cleanouts, and subsequent blocks written due to aggressive

MTTR setting. This action has caused other sessions to get blocked on TX

enqueue contention, blocked on the splitting session. Advice was to set 10224

trace event for the splitter for a short time only in order to get

diagnostics as to why the space search rejected most blocks.

> A secondary symptom are the bitmap level 1 block updates, which may or may

not be related to the space search; I've not seen them before, maybe because

I didn't really pay attention :P , but the symptoms seen in the ASH trace

indicate it's the same problem. Someone in space mgmt has to look at it to

confirm it is the same problem."

与该用户进行了mail私下交流，他的回复：

I still have a case open with Oracle. I believe that this is a bug in the Oracle code. The problem is that it has been difficult to create a reproducible test case for Oracle support. My specific issue was basically put on hold pending the results of another customer’s service request that appeared to have had the same issue, (9034788). Unfortunately they couldn’t reproduce the issue in that case either.

I believe that there is a correlation between the enq TX – index contention wait event and a spike in the number of ‘failed probes on index block reclamation. I have specifically asked Oracle to explain why there is a spike in the ‘failed probes on index block reclamation’ during the same time frame as the enq TX index contention wait event, but they have not answered my question.

I was hoping that some investigation by Oracle Support into the failed probes metric might get someone on the right track to discovering the bug. That hasn’t happened though.

Hi ,

Thanks for your sharing . The bug (or specific ktsp behave) is fatal in response time sensitive OLTP env.

I would like to ask my customer to coalesce those index where massive deleted regularly.

Thanks for your help again!

Yes, I saw that. I have applied patch 8286901 and set the event for version 10.2.0.4, but the problem still occurs periodically. And as I mentioned before, we see a correlation between enq TX waits and the failed probes on index block reclamation. Which is why I still think that it is a bug. I agree that trying to rebuild or coalesce the indexes are simply attempts to workaround the issue and not solve the root cause.

Early on when I started on this issue I did do some index dumps and could clearly see that we had lots of blocks with only 1 or 2 records after our mass delete jobs. I have provided Oracle Support with this information as well as oradump files while the problem is occurring, but they don’t seem to be able to find anything wrong so far.

If you are interested in seeing if you are experiencing a high ‘failed probes on index block reclamation’ event run the query below.

select SS.snap_id,
SS.stat_name,
TO_CHAR(S.BEGIN_INTERVAL_TIME, ‘DAY’) DAY,
S.BEGIN_INTERVAL_TIME,
S.END_INTERVAL_TIME,
SS.value,
SS.value – LAG(SS.VALUE, 1, ss.value) OVER (ORDER BY SS.SNAP_ID) AS DIFF
from DBA_HIST_SYSSTAT SS,
DBA_HIST_SNAPSHOT S
where S.SNAP_ID = SS.SNAP_ID
AND SS.stat_NAME = ‘failed probes on index block reclamation’
ORDER BY SS.SNAP_ID ;

在11gr2上的测试

—在最新的11gr2中进行了测试，仍可以重现该问题（如图单条insert引起了6675的buffer_gets,这是在更大量数据的情况下）。

我们可以猜测Oracle提供的one-off补丁中可能是为叶块分裂所会扫描的“空块”附加了一个上限，在未达到上限的情况下扫描仍会发生。而在主流的公开的发行版本中Oracle不会引入该补丁的内容。尝试在没有缓存的情况下引起分裂问题，分裂引起了大约4000个块的物理读，但该操作仍在0.12秒（有缓存是0.02秒，如图）内完成了（该测试使用普通ata硬盘，读取速度在100MB/S: Timing buffered disk reads: 306 MB in 3.00 seconds = 101.93 MB/sec）；从1月21日的ash视图中可以看到引起split的260会话处于单块读等待(db file sequential read)中,且已等待了43950us约等于44ms;这与良好io的经验值10ms左右有较大出入；我们可以确信io性能问题也是引发此叶块分裂延迟如此显性的一个重要因素。

具体结论

综上所述，在之前讨论的几个方案中,MSSM方式是不能避免索引分裂引起交易超时问题的；不删除数据的方案在许多对象上不可行；10.2.0.4上的one-off补丁因为目前仅存在Linux版本，可以考虑声请补丁后具体测试（因目前没有补丁所以处于未知状态）。Coalesce合并索引是目前既有的最具可操作性且无副作用的解决方案。

这位作者还没有填写简介。

查看所有文章

12 条评论

The third dba's blog » 对Oracle中索引叶块分裂而引起延迟情况的测试 … JP Google May 13, 2010

[…] posted here: The third dba's blog » 对Oracle中索引叶块分裂而引起延迟情况的测试 … ジャズ&オーディオ通信（from […]

Reply
admin June 7, 2010

It is a btree indx . The wait event is not ITL wait. So I think initrans does not matter。We can’t use reverse index , for business rule reason.

I have read Jonathan Lewis ‘s blog. He advise that coalesce will help “A work around to the problem is to do an ‘alter index coalesce’ after doing the massive delete. This cleans all the empty blocks and takes them out of the index structure. (You have to have done a commit after the delete for this to work).”

So I mention that I am testing coalesce , but I find this won’t reduce free block in bit-map block as below log:

19:46:45 SQL> create table test(t1 int) tablespace datatb;

Table created.
SQL> create index ind_t1 on test(t1);

Index created.

Elapsed: 00:00:00.01
19:47:08 SQL> begin
19:47:17 2 for i in 1..20000 loop
19:47:17 3 insert into test values(i);
19:47:17 4 commit;
19:47:17 5 end loop;
19:47:17 6 end;
19:47:18 7 /

PL/SQL procedure successfully completed.
SQL> set serveroutput on;
SQL> declare
2
3 l_fs1_bytes number;
4 l_fs2_bytes number;
5 l_fs3_bytes number;
6 l_fs4_bytes number;
7 l_fs1_blocks number;
8 l_fs2_blocks number;
9 l_fs3_blocks number;
10 l_fs4_blocks number;
11 l_full_bytes number;
12 l_full_blocks number;
13 l_unformatted_bytes number;
14 l_unformatted_blocks number;
15 begin
16 dbms_space.space_usage(
17 segment_owner => user,
18 segment_name => ‘IND_T1’,
19 segment_type => ‘INDEX’,
20 fs1_bytes => l_fs1_bytes,
21 fs1_blocks => l_fs1_blocks,
22 fs2_bytes => l_fs2_bytes,
23 fs2_blocks => l_fs2_blocks,
24 fs3_bytes => l_fs3_bytes,
25 fs3_blocks => l_fs3_blocks,
26 fs4_bytes => l_fs4_bytes,
27 fs4_blocks => l_fs4_blocks,
28 full_bytes => l_full_bytes,
29 full_blocks => l_full_blocks,
30 unformatted_blocks => l_unformatted_blocks,
31 unformatted_bytes => l_unformatted_bytes
32 );
33 dbms_output.put_line(‘ FS1 Blocks = ‘||l_fs1_blocks||’ Bytes = ‘||l_fs1_bytes);
34 dbms_output.put_line(‘ FS2 Blocks = ‘||l_fs2_blocks||’ Bytes = ‘||l_fs2_bytes);
35 dbms_output.put_line(‘ FS3 Blocks = ‘||l_fs3_blocks||’ Bytes = ‘||l_fs3_bytes);
36 dbms_output.put_line(‘ FS4 Blocks = ‘||l_fs4_blocks||’ Bytes = ‘||l_fs4_bytes);
37 dbms_output.put_line(‘Full Blocks = ‘||l_full_blocks||’ Bytes = ‘||l_full_bytes);
38 end;
39 /
FS1 Blocks = 0 Bytes = 0
FS2 Blocks = 4 Bytes = 32768
FS3 Blocks = 0 Bytes = 0
FS4 Blocks = 0 Bytes = 0
Full Blocks = 39 Bytes = 319488

PL/SQL procedure successfully completed.

Full blocks number is 39 ; fs2 is 4

SQL> delete test where t1>10000 and (mod(t1,2)!=0);

5000 rows deleted.
Now :
FS1 Blocks = 0 Bytes = 0
FS2 Blocks = 4 Bytes = 32768
FS3 Blocks = 0 Bytes = 0
FS4 Blocks = 0 Bytes = 0
Full Blocks = 39 Bytes = 319488
nothing changed .

SQL> alter index ind_t1 coalesce;

Index altered.
FS1 Blocks = 0 Bytes = 0
FS2 Blocks = 13 Bytes = 106496
FS3 Blocks = 0 Bytes = 0
FS4 Blocks = 0 Bytes = 0
Full Blocks = 30 Bytes = 245760

the free blocks increased

SQL> delete test where t1>10000;

5000 rows deleted.

SQL> commit;

Commit complete.

Now :
FS1 Blocks = 0 Bytes = 0
FS2 Blocks = 23 Bytes = 188416
FS3 Blocks = 0 Bytes = 0
FS4 Blocks = 0 Bytes = 0
Full Blocks = 20 Bytes = 163840
after full block delete ,the free blocks increased again.

SQL> alter index ind_t1 coalesce;

Index altered.

after coalesc , nothing changed:
FS1 Blocks = 0 Bytes = 0
FS2 Blocks = 23 Bytes = 188416
FS3 Blocks = 0 Bytes = 0
FS4 Blocks = 0 Bytes = 0
Full Blocks = 20 Bytes = 163840

but rebuild:
SQL> alter index ind_t1 rebuild;

Index altered.
FS1 Blocks = 0 Bytes = 0
FS2 Blocks = 1 Bytes = 8192
FS3 Blocks = 0 Bytes = 0
FS4 Blocks = 0 Bytes = 0
Full Blocks = 21 Bytes = 172032

As in my test, coalesce do not reduce free blocks , So I am totally confused.

Reply
admin June 7, 2010

As you can see from this posting, and from a related posting mentioned in comment 3 above, there are at least two reasons why you could see “Enq TX: index contention” if you do large deletes; and the advice to deal with the two different problems depends very strongly on which problem you are seeing.

I hope that the Oracle support consultant explained why the coalesce might help – and pointed out the bits of information in your SR that took him to that conclusion.

As far as your example is concerned the most important point is that it’s irrelevant if it isn’t a good match for your production scenario – do you really delete every other row from a large sequential section of your data; do you even get to a position where a number of deletions over a period of days leave your data in that pattern ?

A couple of comments on your results:

The first “no change” after deleting 25% of the data is probably because you hadn’t issued a commit at that point.

The “free blocks increased” after the first coalesce doesn’t match the perfect scenario – which would have been about 30 at full, and about 10 at FS1 – but the timing for bits being set and cleared in ASSM is an ongoing problem with Oracle and changes with version. We hope for some blocks to be cleared, though, because you have made a number of adjacent leaf blocks half empty, so some of those should be emptied and their contents combined with the next leaf block (i.e. instead of 20 half full blocks you hope to see 10 empty and 10 full).

The “free blocks increased again”. You’ve dleted the rest of the top half of the index, and committed (and the bitmaps have been updated). So where you have approximatly 40 full leaf blocks you now have 20 full and 20 empty.

The “after coalesce, nothing changed”. Because you didn’t have any partly empty leaf blocks that could be the subject of a coalesce.

After rebuild – you had a significant difference in the FS2 and Full blocks: but I’d be inclined to put that down to the weakness of the ASSM bitmap updates ccombined with the way in which the pctfree setting for indexes works – not to mention a couple of little oddities that show up when you’re dealing with small indexes. If you want to analyse exactly what’s going on, you need to do a load of block dumps of the bitmap blocks and the index blocks as you work.

To answer your questions:
The free blocks increase after the coalesce because that’s what the coalesce command is designed to do: free up blocks which can be made free. More significantly, though, it’s supposed to take completely empty blocks out of the index structure and the nasty thing about your example is that it suggests that none of the index leaf blocks has become empty when clearly they should have been. You need to do some block dumps to find out whether this is just a defect in the bitmap report, or whether the blocks really are still in the index structure even though they are empty.

The coalesce MAY resolve the contention if the contention is caused by the behaviour described in my posting. But if the coalesce is leaving empty blocks in the index structure then it MAY have no beneficial effect whatsoever.

The coalesce command does not lock the table.

Reply
admin June 7, 2010

That’s great for your advise . I realize that I should not use dbms_space package to estimate the index but analyze index validate structure.
But Oracle behave with index split is so unsatisfactory. I will test it on 11g r2.
Thx for your help again.

Reply
admin June 7, 2010

maclean,
It’s not a good idea to use “analyze index validate structure” on a production index as the command locks the underlying table. It’s also not very useful for your test case, since all it gives is average stats, and you need more detail. For testing, you might like to check the effects of a treedump (on your little test case first) as this will tell you which blocks are still part of the index structure, and how many rows are in each.

Reply
admin June 7, 2010

I still have a case open with Oracle. I believe that this is a bug in the Oracle code. The problem is that it has been difficult to create a reproducible test case for Oracle support. My specific issue was basically put on hold pending the results of another customer’s service request that appeared to have had the same issue, (9034788). Unfortunately they couldn’t reproduce the issue in that case either.

I believe that there is a correlation between the enq TX – index contention wait event and a spike in the number of ‘failed probes on index block reclamation. I have specifically asked Oracle to explain why there is a spike in the ‘failed probes on index block reclamation’ during the same time frame as the enq TX index contention wait event, but they have not answered my question.

I was hoping that some investigation by Oracle Support into the failed probes metric might get someone on the right track to discovering the bug. That hasn’t happened though.

Reply
admin June 7, 2010

There is another interesting little note under bug number 8286901, which says a similar type of bug (possibly the same one) is fixed in 11.2 – had you seen that ?

Reply
admin June 7, 2010

Yes, I saw that. I have applied patch 8286901 and set the event for version 10.2.0.4, but the problem still occurs periodically. And as I mentioned before, we see a correlation between enq TX waits and the failed probes on index block reclamation. Which is why I still think that it is a bug. I agree that trying to rebuild or coalesce the indexes are simply attempts to workaround the issue and not solve the root cause.

Early on when I started on this issue I did do some index dumps and could clearly see that we had lots of blocks with only 1 or 2 records after our mass delete jobs. I have provided Oracle Support with this information as well as oradump files while the problem is occurring, but they don’t seem to be able to find anything wrong so far.

If you are interested in seeing if you are experiencing a high ‘failed probes on index block reclamation’ event run the query below.

select SS.snap_id,
SS.stat_name,
TO_CHAR(S.BEGIN_INTERVAL_TIME, ‘DAY’) DAY,
S.BEGIN_INTERVAL_TIME,
S.END_INTERVAL_TIME,
SS.value,
SS.value – LAG(SS.VALUE, 1, ss.value) OVER (ORDER BY SS.SNAP_ID) AS DIFF
from DBA_HIST_SYSSTAT SS,
DBA_HIST_SNAPSHOT S
where S.SNAP_ID = SS.SNAP_ID
AND SS.stat_NAME = ‘failed probes on index block reclamation’
ORDER BY SS.SNAP_ID ;

Reply
admin June 7, 2010

Recently i have tested this bug on version 11.2.0.1 , still not fixed.
Maybe Oracle Development department persisted in this index leaf split behavior is right; Saving space is important than a little index contention delay?

Reply
admin June 7, 2010

Oracle 10G release 2 (10.2.0.4) has some new features to help DBA’s. For the issue you are requesting assistance, there is a feature called alter index SHRINK SPACE COMPACT;
You can use this clause to compact the index segments.

The shrink clause lets you manually shrink space in a index. This clause is valid only for segments in tablespaces with automatic segment management. By default, Oracle Database compacts the segment, adjusts the high water mark, and releases the recuperated space immediately. This is because the rows are moved in the process.

Compacting the segment requires row movement. Therefore, you must enable row movement for the object you want to shrink before specifying this clause. Further, if your application has any rowid-based triggers, you should disable them before issuing this clause.

Enabling Row Movement on a Table
To enable row movement on a table, or if you do not know whether row movement is enabled on the table, follow these steps:

In the Database Objects section of the Administration page, click Tables to administer tables.

The Tables page appears.

To find the target table for flashback table, you can enter one or both of the schema name in the Schema field and the table name in the Object Name field. Then click Go to search for the table. For example, search for tables in the hr schema. You may need to page through the search results to find your table.

After you find your table in the schema, select the table from the list of tables. For example, select employees. Click Edit.

The Edit Table: table_name page appears.

Click Options to navigate to the Options property page. Make sure Enable Row Movement is set to Yes, and click Apply to update the options for the table.

When the page has refreshed, you can click Tables in the locator link at the top of the page to return to the search results, and enable row movement on more tables by repeating these steps for each table.

Reply
admin June 7, 2010

the problem is that coalescing is not showing an increase of the free space, but instead an increase in free blocks (i.e. leaving the blocks assigned as opposed to releasing them).

An increase in free blocks tells me that the blocks are being coalesced, yet they are not being released.
This is something that the index shrink is designed to do while the coalesce, according to the documentation, is designed to move the rows, but I did not see anywhere where it said it would release the blocks.

There is also the matter of the blocks being partially released in bug: 8286901. I believe also that until we try the command I suggested, we cannot say this is anything more than similar.

Reply
admin August 25, 2010

Hdr: 8286901 10.2.0.3.0 RDBMS 10.2.0.3.0 SPACE PRODID-5 PORTID-226
Abstract: SPACE SEARCH INEFFICIENCY DURING INDEX SPLIT

PROBLEM:
——–
Issue encountered by customer and Oracle developer (Stefan Pommerenk).
He describes is thus:
“Space search performed by the index splitter can’t find space in neighboring
blocks, and then instead of allocating new space, we go and continue to
search for space elsewhere, which manifests itself in block reads from disk,
block cleanouts, and subsequent blocks written due to aggressive MTTR
setting.”

“To clarify: the cleanouts are not the problem per se. The culprit seems to
be that the space search performed by the index splitter can’t find space in
neighboring blocks, and then instead of allocating new space, we go and
continue to search for space elsewhere, which manifests itself in block reads
from disk, block cleanouts, and subsequent blocks written due to aggressive
MTTR setting. This action has caused other sessions to get blocked on TX
enqueue contention, blocked on the splitting session. Advice was to set 10224
trace event for the splitter for a short time only in order to get
diagnostics as to why the space search rejected most blocks.
> A secondary symptom are the bitmap level 1 block updates, which may or may
not be related to the space search; I’ve not seen them before, maybe because
I didn’t really pay attention 😛 , but the symptoms seen in the ASH trace
indicate it’s the same problem. Someone in space mgmt has to look at it to
confirm it is the same problem.”

DIAGNOSTIC ANALYSIS:
——————–
Performed by Oracle development.
OSS involvement is just in filing the bug as requested.

WORKAROUND:
———–

RELATED BUGS:
————-

REPRODUCIBILITY:
—————-

TEST CASE:
———-

STACK TRACE:
————

SUPPORTING INFORMATION:
———————–

24 HOUR CONTACT INFORMATION FOR P1 BUGS:
—————————————-

DIAL-IN INFORMATION:
——————–

IMPACT DATE:
————
RELEASE NOTES:
]]When DML requires index maintenance, and this requires an index root
]]block split (as will be indicated by an increase in the BLEVEL of the
]]index), it is possible for the split to incur high CPU and logicalIO
]]costs in certain circumstances (specifically, when the index contains
]]many empty leaf blocks).

REDISCOVERY INFORMATION:
Excessive logical IO (on index segments), excessive “enq: TX – index
contention” (in OLTP environments), and excessive “failed probes on
index block reclamation” during index maintenance operations that cause
an increase in the BLEVEL of the index (i.e. a root block split).
WORKAROUND:
Proactively coalesce said indexes.

Additional documentation re: the fix esp. for backports.

The fix described above is controlled by event 43822.

The value of the “level” for this event is interpreted as a bitmask.

0x01 => enable the fix, otherwise stick to existing behavior.
0x02 => enable tracing of branch split operations on indexes, otherwise be
silent.

i.e.

-> if event 43822 is not set, the fix is completely inactive.

-> if event 43822 is set to level=1, the fix is silently enabled.

-> if event 43822 is set to level=2, the fix is disabled but tracing of
branch split operations to tracefiles is enabled.

-> if event 43822 is set to level=3, the fix and tracing are both enabled.

any other level setting is not supported and should be avoided.
*** 05/20/09 04:52 pm ***
Additional documentation re: the fix esp. for backports.

Whenever the fix is enabled (see previous entries re: event 43822), root
block splits examine no more than 5 empty blocks for reclamation before
attempting a segment extension.

The default value of “5” blocks examined can be changed via event 43820.

If event 43820 is unset, the default value of “5” applies. Any level >0 for
event 43820 represents the #blocks that will be examined for reclamation
before switching to segment extension.

it is expected that the default value (i.e. no setting for event 43820) will
work fine for any customer who wishes to enable the fix. however, event 43820
is available for experimentation and fine-tuning.
*** 05/20/09 04:56 pm ***
Additional documentation re: the fix esp. for backports.

Two new session statistics have been created as part of this fix.

-> “root node splits”: this represents the subset of “branch node splits”
that operate on the root block of an index segment. non-root branch splits do
not modify this statistic.

-> “index reclamation/extension switch”: is incremented once for every index
branch split operation where the fix described by this bug (see event 43822,
etc.) is enabled. this statistic provides a quick verification for whether
the fix is operational (i.e. both whether it is enabled via the control
event, and whether it actually was invoked at runtime on any indexes).
*** 05/20/09 04:58 pm ***
The 2 new statistics above will be mapped to spare statistics in backports;
see backport bug entries for specific details about which spare statistics
were actually used (or if no suitable spares were available and the
statistics were not implemented; 10g backports will not have spares).

Reply

ASKMACLEAN

对Oracle中索引叶块分裂而引起延迟情况的测试和分析

具体结论

12 条评论

发表评论取消回复

具体结论

【Oracle Database 12c】12c 常见问题FAQ

Script:收集ASM诊断信息

Oracle中表丢失被drop的恢复过程

12 条评论

发表评论 取消回复

发表评论取消回复