- Oracle ASM的基本思想
- 通过应用ASM使得设计/使用简便化
- ASM设计指针
- ASM的Rebalancing 与 stripe
- High Redundancy的优点
- vs. Sympro /阶层化存储
- 对每个访问类型都分配磁盘配置数据
- Seek等待时间 + 回转等待时间 > 读出时间
–sequential访问(重视完全灵活使用传送带宽的吞吐量)
- seek等待时间 + 回转等待时间 < 读出时间
Hard Disk Drive技术演进
随机访问的并列化时sequential访问也高速化
1.提高Disk回转速度
缩短同样的数据的读出时间
变更sector的配置,缩小搜索范围
2.Disk外周的sector的高密度化
周边seek增多,头的移动距离锁定,随机访问高速化
3.Disk的低价化
通过将多个disk捆在一起,可以保证I/O性能
存储物理设计的変化
从部分最优化到整体最优化
- 通过访问类型来分离的需求减少
Oracle ASM的基本思想
Stripe And Mirror Everything (S. A. M. E)
- 设计手法为:以均衡利用所有disk为目的,将数据进行stripe,分散在所有磁盘上,进行Mirroring。
Oracle Automatic Storage Management
架构
- ASM实例
- Cluster Synchronization Services
- ASM Diskgroup
- ASM Disk
Allocation Unit, File Extent & ASM File
- Allocation Unit(AU:=分配单位)
- File Extent
- 最初的20000extent(0~19999) à 1*AU à 1~20GB (1AU=1MB时)
- 下一个的20000extent(20000~39999) à 4*AU à 20~40GB (1AU=1MB时)
- 下一个的20000extent以上(40000~) à16*AU à 40GB~ (1AU=1MB时)
- ASM File(= Data File, etc)
- 对于ASM Diskgroup中包含的所有的ASM Disk,将ASM File(Data File)以File Extent(AU)单位来配置
S.A.M.E只能应用OLTP处理?
- OLTP处理
- DWH/批量处理
- 比如、15krpm的HDD有60个的情况(假设中等规模系统)
- 通过Oracle ORION的验证结果、40-45iops/Disk
- 40iops × 1MB/io = 40MB/sec
- 40MB/sec × 60本 = 2.4GB/sec
- 4Gbit / 8 bit / 1sec = 512MB/sec
- 512MB/sec × 2本 = 1GB/sec
- 1GB/sec × 2节点 = 2.0GB/sec
传统的RAW device结构的课题-1
可以通过采用Oracle ASM来解决
- 细致分割表区域
- 追加数据文件时,没有对现有文件进行rebalance
通过Oracle ASM使得管理简单化
从顾客的传统结构中观察到的ASM的优点
- 操作简便化
- 减少管理对象
- 数据再配置的工数不要
- 将多台便宜的存储器绑定在一起
- 没有硬件RAID or RAID0(最多也就Disk8台以内)
- 一个RAID Group只会切换一个Logical Unit
- 尽可能将较多的LU绑定在一个ASM Diskgroup中
- 可用性通过High Redundancy(3重化)来担保
- 通过一样性能/尺寸的Disk(LU)来构成ASM Diskgroup
- 通过SSD构成面向Redo日志的Diskgroup
- 不使用存储的虚拟化功能(Thin Provisioning/阶层化)
- 构成绑定FC带宽不同的多个存储器的ASM Diskgroup时,需要调整各个存储器中切换的LU数
- 使用回转数不同的HDD时,请调整Disk个数构成RAID
ASM Diskgroup的构成
Oracle ASM的rebalance(数据重新配置)
可以不终止数据库就重新配置
- 追加、删除ASM Disk(Logical Unit:LU)时, 为了维持「S. A. M. E」,需要重新配置数据
通过ASM,分散配置数据
使用Data File单位对各ASM Disk的File Extent(AU)进行均匀地分割
查看stripe状态
$ sqlplus / as sysasm
SQL>
create diskgroup TDG external redundancy disk
'/dev/emcpowere1','/dev/emcpowerf1','/dev/emcpowerg1','/dev/emcpowerh1' ;
set linesize 150 pagesize 50000
col name for a8
col path for a36
select b.group_number,b.disk_number,a.name,b.path,b.state,b.total_mb
from v$asm_diskgroup a,v$asm_disk b
where a.group_number(+)=b.group_number
and a.name='TDG'
order by 2;
GROUP_NUMBER DISK_NUMBER NAME PATH STATE TOTAL_MB
------------ ----------- -------- ------------------------------------ -------- ----------
4 0 TDG /dev/emcpowere1 NORMAL 136850
4 1 TDG /dev/emcpowerf1 NORMAL 136850
4 2 TDG /dev/emcpowerg1 NORMAL 136850
4 3 TDG /dev/emcpowerh1 NORMAL 136850
制成ASM File(Data File = 表空间)
$ sqlplus / as sysdba
SQL>
create tablespace TS0 datafile '+TDG(DATAFILE)' size 32m;
create tablespace TS1 datafile '+TDG(DATAFILE)' size 32m;
create tablespace TS2 datafile '+TDG(DATAFILE)' size 32m;
select C.NAME "GRP_NAME",
A.GROUP_NUMBER "GRP#",
A.FILE_NUMBER "FILE#",
A.TYPE "FILE_TYPE",
B.NAME "FILE_NAME",
A.BLOCKS "USED(BLK)",
trunc(A.BYTES/1024/1024) "USED(MB)",
trunc(A.SPACE/1024/1024) "ALLOCATE(MB)"
from V$ASM_FILE "A", V$ASM_ALIAS "B", V$ASM_DISKGROUP "C"
where A.GROUP_NUMBER = B.GROUP_NUMBER
and A.GROUP_NUMBER = C.GROUP_NUMBER
and A.FILE_NUMBER = B.FILE_NUMBER
and B.FILE_NUMBER < 4294967295 -- without Directories
and B.SYSTEM_CREATED = 'Y' -- without Symbolic Links
and C.NAME = 'TDG'
order by 3;
GRP_NAME GRP# FILE# FILE_TYPE FILE_NAME USED(BLK) USED(MB) ALLOCATE(MB)
-------- ---------- ---------- ---------------- -------------------- ---------- ---------- ------------
TDG 4 256 DATAFILE TS0.256.792245841 4097 32 33
TDG 4 257 DATAFILE TS1.257.792245841 4097 32 33
TDG 4 258 DATAFILE TS2.258.792245843 4097 32 33
使用了X$KFDAT的样本SQL
select A.NAME "GRP_NAME",
B.PATH "DV_NAME",
A.GROUP_NUMBER "GRP#",
B.DISK_NUMBER "DISK#",
X.AUNUM_KFDAT "AU#",
X.FNUM_KFDAT "FILE#",
X.XNUM_KFDAT "ASM_EXT#",
X.SIZE_KFDAT "EXT_SIZE",
C.NAME "FILE_NAME"
from X$KFDAT "X", V$ASM_DISKGROUP "A", V$ASM_DISK "B", V$ASM_ALIAS "C"
where X.GROUP_KFDAT = A.GROUP_NUMBER
and X.GROUP_KFDAT = B.GROUP_NUMBER
and X.NUMBER_KFDAT = B.DISK_NUMBER
and X.FNUM_KFDAT = C.FILE_NUMBER
and A.GROUP_NUMBER = B.GROUP_NUMBER
and A.GROUP_NUMBER = C.GROUP_NUMBER
and X.FNUM_KFDAT != 0
and C.FILE_NUMBER < 4294967295
and C.SYSTEM_CREATED = 'Y'
and A.NAME = 'TDG'
order by GRP#, DISK#, AU# ;
使用了X$KFDAT的样本SQL (部分)
GRP_NAME DV_NAME GRP# DISK# AU# FILE# ASM_EXT# EXT_SIZE FILE_NAME
-------- ---------------- ---------- ---------- ---------- ---------- ---------- ---------- --------------------
TDG /dev/emcpowere1 4 0 15 256 1 1 TS0.256.792245841
TDG /dev/emcpowere1 4 0 16 256 5 1 TS0.256.792245841
TDG /dev/emcpowere1 4 0 17 256 9 1 TS0.256.792245841
TDG /dev/emcpowere1 4 0 18 256 13 1 TS0.256.792245841
TDG /dev/emcpowere1 4 0 19 256 17 1 TS0.256.792245841
TDG /dev/emcpowere1 4 0 20 256 21 1 TS0.256.792245841
TDG /dev/emcpowere1 4 0 21 256 25 1 TS0.256.792245841
TDG /dev/emcpowere1 4 0 22 256 29 1 TS0.256.792245841
TDG /dev/emcpowere1 4 0 23 257 3 2 TS1.257.792245841
TDG /dev/emcpowere1 4 0 24 257 7 2 TS1.257.792245841
TDG /dev/emcpowere1 4 0 25 257 11 2 TS1.257.792245841
TDG /dev/emcpowere1 4 0 26 257 15 2 TS1.257.792245841
TDG /dev/emcpowere1 4 0 27 257 19 2 TS1.257.792245841
TDG /dev/emcpowere1 4 0 28 257 23 2 TS1.257.792245841
TDG /dev/emcpowere1 4 0 29 257 27 2 TS1.257.792245841
TDG /dev/emcpowere1 4 0 30 257 31 2 TS1.257.792245841
TDG /dev/emcpowere1 4 0 31 258 2 4 TS2.258.792245843
TDG /dev/emcpowere1 4 0 32 258 6 4 TS2.258.792245843
TDG /dev/emcpowere1 4 0 33 258 10 4 TS2.258.792245843
TDG /dev/emcpowere1 4 0 34 258 14 4 TS2.258.792245843
TDG /dev/emcpowere1 4 0 35 258 18 4 TS2.258.792245843
TDG /dev/emcpowere1 4 0 36 258 22 4 TS2.258.792245843
使用了X$KFDAT的样本SQL
select A.AU#,
B.FNUM_KFDAT||' ['||B.XNUM_KFDAT||']' "emcpowere1",
C.FNUM_KFDAT||' ['||C.XNUM_KFDAT||']' "emcpowerf1",
D.FNUM_KFDAT||' ['||D.XNUM_KFDAT||']' "emcpowerg1",
E.FNUM_KFDAT||' ['||E.XNUM_KFDAT||']' "emcpowerh1"
from (select level + MIN - 1 "AU#"
from (select min(MIN) MIN, max(MAX) MAX
from (select min(AUNUM_KFDAT) "MIN", max(AUNUM_KFDAT) "MAX"
from X$KFDAT
where GROUP_KFDAT = 4
and FNUM_KFDAT != 0
and FNUM_KFDAT != 1048575
group by NUMBER_KFDAT)
) connect by level < MAX + 1) A,
(select AUNUM_KFDAT, FNUM_KFDAT, XNUM_KFDAT from X$KFDAT where GROUP_KFDAT=4 and NUMBER_KFDAT=0) B,
(select AUNUM_KFDAT, FNUM_KFDAT, XNUM_KFDAT from X$KFDAT where GROUP_KFDAT=4 and NUMBER_KFDAT=1) C,
(select AUNUM_KFDAT, FNUM_KFDAT, XNUM_KFDAT from X$KFDAT where GROUP_KFDAT=4 and NUMBER_KFDAT=2) D,
(select AUNUM_KFDAT, FNUM_KFDAT, XNUM_KFDAT from X$KFDAT where GROUP_KFDAT=4 and NUMBER_KFDAT=3) E
where A.AU#=B.AUNUM_KFDAT(+)
and A.AU#=C.AUNUM_KFDAT(+)
and A.AU#=D.AUNUM_KFDAT(+)
and A.AU#=E.AUNUM_KFDAT(+) ;
另一个使用了X$KFDAT的样本SQL
AU# emcpowere1 emcpowerf1 emcpowerg1 emcpowerh1
---------- ------------ ------------ ------------ -----------
2 1 [0] 3 [2] 2 [0] 3 [3]
3 3 [0] 3 [6] 3 [1] 3 [7]
4 3 [4] 3 [10] 3 [5] 3 [11]
5 3 [8] 3 [14] 3 [9] 3 [15]
6 3 [12] 3 [18] 3 [13] 3 [19]
7 3 [16] 3 [22] 3 [17] 3 [23]
8 3 [20] 3 [26] 3 [21] 3 [27]
9 3 [24] 3 [30] 3 [25] 3 [31]
10 3 [28] 3 [34] 3 [29] 3 [35]
11 3 [32] 3 [38] 3 [33] 3 [39]
12 3 [36] 4 [0] 3 [37] 256 [0]
13 3 [40] 5 [0] 3 [41] 256 [4]
14 6 [0] 256 [3] 4 [1] 256 [8]
15 256 [1] 256 [7] 1 [1] 256 [12]
16 256 [5] 256 [11] 256 [2] 256 [16]
17 256 [9] 256 [15] 256 [6] 256 [20]
18 256 [13] 256 [19] 256 [10] 256 [24]
19 256 [17] 256 [23] 256 [14] 256 [28]
20 256 [21] 256 [27] 256 [18] 256 [32]
21 256 [25] 256 [31] 256 [22] 257 [0]
22 256 [29] 257 [2] 256 [26] 257 [4]
23 257 [3] 257 [6] 256 [30] 257 [8]
AU# emcpowere1 emcpowerf1 emcpowerg1 emcpowerh1
---------- ------------ ------------ ------------ -----------
24 257 [7] 257 [10] 257 [1] 257 [12]
25 257 [11] 257 [14] 257 [5] 257 [16]
26 257 [15] 257 [18] 257 [9] 257 [20]
27 257 [19] 257 [22] 257 [13] 257 [24]
28 257 [23] 257 [26] 257 [17] 257 [28]
29 257 [27] 257 [30] 257 [21] 257 [32]
30 257 [31] 258 [3] 257 [25] 258 [0]
31 258 [2] 258 [7] 257 [29] 258 [4]
32 258 [6] 258 [11] 258 [1] 258 [8]
33 258 [10] 258 [15] 258 [5] 258 [12]
34 258 [14] 258 [19] 258 [9] 258 [16]
35 258 [18] 258 [23] 258 [13] 258 [20]
36 258 [22] 258 [27] 258 [17] 258 [24]
37 258 [26] 258 [31] 258 [21] 258 [28]
38 258 [30] 0 [524640] 258 [25] 258 [32]
39 0 [524648] 0 [5767528] 258 [29] 0 [524648]
40 0 [6291824] 0 [6291824] 0 [524656] 0 [6291824]
追加Disk+rebalance后
AU# emcpowere1 emcpowerf1 emcpowerg1 emcpowerh1 emcpoweri1
---------- ------------ ------------ ------------ ------------ ------------
2 1 [0] 3 [2] 258 [25] 3 [3] 2 [0]
3 258 [30] 3 [0] 258 [21] 3 [7] 3 [6]
4 3 [4] 3 [10] 3 [5] 3 [1] 3 [11]
5 3 [8] 3 [14] 3 [9] 3 [15] 3 [16]
6 3 [12] 3 [18] 3 [13] 3 [19] 3 [21]
7 258 [26] 3 [22] 3 [17] 3 [23] 3 [26]
8 3 [20] 258 [31] 258 [17] 3 [27] 3 [31]
9 3 [24] 3 [30] 3 [25] 5 [0] 3 [36]
10 3 [28] 3 [34] 3 [29] 3 [35] 3 [41]
11 3 [32] 3 [38] 3 [33] 3 [39] 256 [0]
12 258 [22] 4 [0] 3 [37] 258 [32] 256 [5]
13 3 [40] 258 [27] 258 [13] 256 [4] 256 [10]
14 6 [0] 256 [3] 4 [1] 256 [8] 256 [15]
15 256 [1] 256 [7] 1 [1] 256 [12] 256 [20]
16 258 [18] 256 [11] 256 [2] 256 [16] 256 [25]
17 256 [9] 258 [23] 256 [6] 258 [28] 256 [30]
18 256 [13] 256 [19] 258 [5] 256 [24] 257 [2]
19 256 [17] 256 [23] 256 [14] 256 [28] 257 [7]
20 256 [21] 256 [27] 256 [18] 256 [32] 257 [12]
21 257 [0] 256 [31] 256 [22] 258 [20] 257 [17]
22 256 [29] 258 [15] 256 [26] 257 [4] 257 [22]
23 257 [3] 257 [6] 258 [1] 257 [8] 257 [27]
24 258 [10] 257 [10] 257 [1] 258 [16] 257 [32]
25 257 [11] 257 [14] 257 [5] 257 [16] 258 [4]
26 257 [15] 257 [18] 257 [9] 257 [20] 258 [9]
27 257 [19] 258 [11] 257 [13] 257 [24] 258 [14]
28 257 [23] 257 [26] 257 [29] 257 [28] 258 [19]
29 258 [6] 257 [30] 257 [21] 258 [12] 258 [24]
30 257 [31] 258 [3] 257 [25] 258 [0] 258 [29]
31 258 [2] 258 [7] 0 [524584] 258 [8] 0 [524584]
32 0 [524592] 0 [524592] 0 [2097456] 0 [524592] 0 [2097456]
AU# emcpowere1 emcpowerf1 emcpowerg1 emcpowerh1
---------- ------------ ------------ ------------ -----------
2 1 [0] 3 [2] 2 [0] 3 [3]
3 3 [0] 3 [6] 3 [1] 3 [7]
4 3 [4] 3 [10] 3 [5] 3 [11]
5 3 [8] 3 [14] 3 [9] 3 [15]
6 3 [12] 3 [18] 3 [13] 3 [19]
7 3 [16] 3 [22] 3 [17] 3 [23]
8 3 [20] 3 [26] 3 [21] 3 [27]
9 3 [24] 3 [30] 3 [25] 3 [31]
10 3 [28] 3 [34] 3 [29] 3 [35]
11 3 [32] 3 [38] 3 [33] 3 [39]
12 3 [36] 4 [0] 3 [37] 256 [0]
13 3 [40] 5 [0] 3 [41] 256 [4]
14 6 [0] 256 [3] 4 [1] 256 [8]
15 256 [1] 256 [7] 1 [1] 256 [12]
16 256 [5] 256 [11] 256 [2] 256 [16]
17 256 [9] 256 [15] 256 [6] 256 [20]
18 256 [13] 256 [19] 256 [10] 256 [24]
19 256 [17] 256 [23] 256 [14] 256 [28]
20 256 [21] 256 [27] 256 [18] 256 [32]
..........................................
30 257 [31] 258 [3] 257 [25] 258 [0]
31 258 [2] 258 [7] 257 [29] 258 [4]
32 258 [6] 258 [11] 258 [1] 258 [8]
33 258 [10] 258 [15] 258 [5] 258 [12]
34 258 [14] 258 [19] 258 [9] 258 [16]
35 258 [18] 258 [23] 258 [13] 258 [20]
36 258 [22] 258 [27] 258 [17] 258 [24]
37 258 [26] 258 [31] 258 [21] 258 [28]
38 258 [30] 0 [524640] 258 [25] 258 [32]
39 0 [524648] 0 [5767528] 258 [29] 0 [524648]
40 0 [6291824] 0 [6291824] 0 [524656] 0 [6291824]
rebalance的操作
重新查看、理解stripe状态
- 使用ASM File(= Data File)单位均匀分布
- 移动到追加Disk的File Extent的对象是所有ASM File
- 被去除的部分中,从位于ASM Disk之后的File Extent(AU)开始向上移动
- 可以变更PSR11.2.0.2~rebalance強度(power_limit)的操作
- 但是如果11.2.0.2没有设定ASM Diskgroup的COMPATIBILITY属性的话,12以上就不能设置了
- 启动执行rebalance的ARB0进程的节点是一个
- power_limit的値是ARB0进程同时执行的非同步I/O数
set linesize 150 select * from V$ASM_OPERATION ; GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE ------------ ----- ---- ---------- ---------- ---------- ---------- ---------- ----------- ----------- 3 REBAL RUN 128 128 71 31408 945 33 $ ps -efH|grep arb|grep -v grep grid 11242 1 4 21:16 ? 00:00:04 asm_arb0_+ASM1Oracle ASM中Disk故障时的操作 可以完善存储的缺点
- 前提是Mirroring结构
- 从secondary中读取,自动修复不良块
- 自动online处理故障Disk
- 通过高速镜像再同步,可以从生存disk中同步最小限度的数据
- 可以设定每个文件的Mirroring(External/Normal/High)
考虑了Disk的二重故障的构成方法
RAID1+0 × Normal vs. RAID0 × High
- RAID1+0 × ASM Normal Redundancy
- RAID0 × ASM High Redundancy
- Disk故障时、因为包含Disk,RAID group整体都无法使用了,所以需要充分考虑性能,来决定RAID0的个数
RAID1+0 × Normal vs. RAID0 × High
Write时的I/O量与分散状況
Read时访问的Disk数
ASM High Redundancy的优点
ASM是处理二重故障的最优选项
- 比较前一页中的各个结构时,发现 RAID0 + ASM High Redundancy好多了
| RAID1+0 × Normal | RAID0 × High | |
| Disk使用量 | 4倍 | 3倍 |
| Write时存储内的I/O量 | 4倍 | 3倍 |
| Write时写入的Disk数 | 8本 | 12本 |
| Write时的FC带宽的数据量 | 2倍 | 3倍 |
| Read时访问Disk数 | 6本 | 12本 |
- 减少磁盘追加工数
- 尽量规避通过ASM追加磁盘时进行rebalance
- 对I/O性能要求不高的小规模多个数据库都集中在一个较大的存储器中时
–通过将使用率与现实使用量相协调,可以大幅减少成本
–多个数据库之间,不执行存储管理的操作(制成/删除RAID Group、切换LU、分割服务器等)、通过熟练使用物理Disk,可以减少使用的人工
–ASM并不是不支持Sympro
- 特别是储存数据库的表区域的数据的结构与Sympro很相似
- 即使使用ASM均匀分散了,也可能出现在特定的物理disk集中访问(可能发生Hotspot)
- rebalance的思想
- 对所有的LU按照I/O要求均匀分散,提高I/O性能
- 考虑到数据库对象的机制
- 考虑到数据库对象的机制
- 无法根据访问频率来分散
- 多个存储虚拟化layer可能导致trouble shooting的延迟以及误操作
- 顶多到理论Volume水平。
- 解放空白区域时,需要执行特别的utility
- ASM Storage Reclamation Utility (ASRU)
- Zero Page Reclamation Utility
- SR 3-3222667411: RC:ASRU resize was not successful when ASM allocation unit size is 32MB
- SR 3-3214700961: ASRU should not reclaim space when disk failed to be resized to smaller size
- 通过Oracle ASM Storage Reclamation Utility与3PAR Thin Persistence的合作,维持较高的存储使用率
- ASUR Download à ASM Storage Reclamation Utility (ASRU) Download
- 分析过去的I/O实绩,在某个时点自动重新配置数据
- 根据访问频率,将数据移动到合适的device(SSD or HDD)中
使用Oracle Database时的注意事项
通过自动阶层化功能进行不透明化(1)
- 分析过去的I/O实绩,在某个时点重新自动配置数据
- 包含维护时以及故障时的I/O需求
- 由于访问模式(天or月、白天 or 晚上、临时表) 访问数据时间可能产生偏差
- SSD: 适合Small Random Read HDD: 适合Large Sequential Read
- 分析过去的I/O实绩,在某个时点重新自动配置数据
- 需要考虑I/O需求的时机与再配置的事务影响进行设计
- 分析过去的I/O实绩,在某个时点重新自动配置数据
- 一个表数据被分配到SSD与HDD的中进行配置时,因为 段(表/索引)单位、表区域(数据文件)单位的I/O性能都会变成平均値,所以很难指定瓶颈
- 很难重现数据配置以及发生问题时的具体状况