概述

在Oracle 11gR2 RAC环境中,ASM磁盘路径的变更是一个复杂但常见的运维需求。典型场景包括:从AIX原生MPIO迁移到Veritas DMP多路径软件、存储更换导致设备名变化、或UDEV规则调整等。由于11gR2 RAC的CRS服务深度依赖ASM(OCR和Voting Disk存储在ASM中),简单修改asm_diskstring参数并不能解决问题,还需要正确处理Voting Disk的路径更新。

本文将详细介绍在11gR2 RAC环境中安全修改ASM磁盘路径的完整步骤,包括故障现象分析、恢复流程和注意事项。


一、问题背景与故障现象

1.1 典型场景

以下是一个真实的迁移案例:

项目 变更前 变更后
操作系统 AIX 6.1
Grid版本 11.2.0.3
多路径软件 AIX MPIO Veritas DMP
ASM磁盘路径 /dev/rhdisk* /dev/vx/rdmp/*

1.2 故障现象

修改asm_diskstring后,CRS启动失败,日志中出现以下错误:

# cssd日志关键信息
2012-07-13 15:07:29.762: [ CSSD]clssnmReadDiscoveryProfile: voting file discovery string(/dev/vx/rdmp/*)
2012-07-13 15:07:29.802: [ CLSF]checksum failed for disk:/dev/vx/rdmp/v_df8000_916:
2012-07-13 15:07:29.816: [ CSSD]clssnmvDiskVerify: Successful discovery of 0 disks
2012-07-13 15:07:29.816: [ CSSD]clssnmvFindInitialConfigs: No voting files found
2012-07-13 15:07:29.816: [ CSSD](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds

1.3 故障原因分析

11gR2 RAC的启动依赖链:


OHASD启动
    │
    ▼
CSSD启动 ──▶ 需要找到Voting Disk
    │
    ▼
ASM实例启动 ──▶ 需要asm_diskstring正确
    │
    ▼
CRS服务启动 ──▶ 需要访问OCR(存储在ASM中)
    │
    ▼
数据库实例启动

问题的根本原因是:Voting Disk的路径信息存储在GPnP Profile中,即使修改了asm_diskstring,CSSD仍然按照旧路径搜索Voting Disk,导致启动失败。


二、解决方案概述

2.1 操作步骤总览

步骤 操作内容 执行用户
1 停止OHASD服务 root
2 以排他模式启动CRS(仅启动ASM) root
3 修改asm_diskstring并挂载磁盘组 grid
4 更新ASM SPFILE grid
5 替换Voting Disk路径 root
6 验证OCR状态 root
7 重启CRS服务 root

2.2 前提条件

  • 新的ASM磁盘路径已正确配置(UDEV规则、多路径软件等)
  • 磁盘权限正确(属主为grid用户,权限为660)
  • 所有RAC节点上的磁盘路径一致
  • 建议先在测试环境验证

三、详细操作步骤

3.1 环境准备与验证

首先验证当前环境状态:

# 以grid用户登录,检查当前ASM配置
[grid@rac1 ~]$ export ORACLE_HOME=/u01/app/11.2.0/grid
[grid@rac1 ~]$ sqlplus / as sysasm

SQL> SHOW PARAMETER asm_diskstring

NAME                 TYPE        VALUE
-------------------- ----------- ------------------------------
asm_diskstring       string      /dev/asm*

SQL> SELECT NAME, STATE, TYPE FROM V$ASM_DISKGROUP;

NAME            STATE       TYPE
--------------- ----------- ------
SYSTEMDG        MOUNTED     NORMAL
DATADG          MOUNTED     NORMAL

检查当前Voting Disk和OCR位置:

# 以root用户执行
[root@rac1 ~]# crsctl query css votedisk

##  STATE    File Universal Id                File Name            Disk group
--  -----    -----------------                ---------            ----------
 1. ONLINE   6896bfc3d1464f9fbf0ea9df87e023ad (/dev/asm-diskb)     [SYSTEMDG]
 2. ONLINE   58eb81b656084ff2bfd315d9badd08b7 (/dev/asm-diskc)     [SYSTEMDG]
 3. ONLINE   6bf7324625c54f3abf2c942b1e7f70d9 (/dev/asm-diskd)     [SYSTEMDG]
Located 3 voting disk(s).

[root@rac1 ~]# ocrcheck

Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2844
         Available space (kbytes) :     259276
         ID                       :  879001605
         Device/File Name         :  +SYSTEMDG
         Cluster registry integrity check succeeded

3.2 修改磁盘设备名(以UDEV为例)

本例演示将ASM磁盘从/dev/asm-disk*修改为/dev/rasm-disk*

# 备份原UDEV规则文件
[root@rac1 ~]# cd /etc/udev/rules.d/
[root@rac1 rules.d]# cp 99-oracle-asmdevices.rules 99-oracle-asmdevices.rules.bak

# 编辑UDEV规则,修改设备名
[root@rac1 rules.d]# vi 99-oracle-asmdevices.rules

修改后的UDEV规则示例:

# 将NAME="asm-disk*" 修改为 NAME="rasm-disk*"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", \
    RESULT=="SATA_VBOX_HARDDISK_VB09cadb31-cfbea255_", \
    NAME="rasm-diskb", OWNER="grid", GROUP="asmadmin", MODE="0660"

KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", \
    RESULT=="SATA_VBOX_HARDDISK_VB5f097069-59efb82f_", \
    NAME="rasm-diskc", OWNER="grid", GROUP="asmadmin", MODE="0660"

# ... 其他磁盘配置

重要提示:在RAC环境中,所有节点都需要做相同的UDEV配置修改。

3.3 重启系统应用新设备名

# 重启操作系统
[root@rac1 ~]# init 6

# 重启后验证新设备名
[root@rac1 ~]# ls -l /dev/*asm*

brw-rw---- 1 grid asmadmin 8,  16 Jul 15 04:15 /dev/rasm-diskb
brw-rw---- 1 grid asmadmin 8,  32 Jul 15 04:15 /dev/rasm-diskc
brw-rw---- 1 grid asmadmin 8,  48 Jul 15 04:15 /dev/rasm-diskd
brw-rw---- 1 grid asmadmin 8,  64 Jul 15 04:15 /dev/rasm-diske
brw-rw---- 1 grid asmadmin 8,  80 Jul 15 04:15 /dev/rasm-diskf

此时CRS会尝试启动但会失败,查看日志确认问题:

# 检查CSSD日志
[root@rac1 ~]# tail -100 $GRID_HOME/log/$(hostname)/cssd/ocssd.log

2012-07-15 04:17:45.208: [ SKGFD]Discovery with str:/dev/asm*:
2012-07-15 04:17:45.208: [ SKGFD]UFS discovery with :/dev/asm*:
2012-07-15 04:17:45.208: [ CSSD]clssnmvDiskVerify: Successful discovery of 0 disks
2012-07-15 04:17:45.208: [ CSSD]clssnmvFindInitialConfigs: No voting files found
2012-07-15 04:17:45.208: [ CSSD](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found.

3.4 停止OHASD服务

[root@rac1 ~]# crsctl stop has -f

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

3.5 以排他模式启动CRS

使用-excl -nocrs参数启动,这将仅启动ASM实例而不启动完整的CRS服务栈:

[root@rac1 ~]# crsctl start crs -excl -nocrs

CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded

# 可选:修改ORACLE_BASE权限,避免潜在权限问题
[root@rac1 ~]# chmod 777 /u01

3.6 修改ASM实例参数并挂载磁盘组

# 切换到grid用户
[root@rac1 ~]# su - grid

[grid@rac1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.3.0 Production

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

-- 步骤1:修改asm_diskstring为新路径
SQL> ALTER SYSTEM SET asm_diskstring='/dev/rasm*';

System altered.

-- 步骤2:挂载存放OCR/Voting Disk的磁盘组
SQL> ALTER DISKGROUP systemdg MOUNT;

Diskgroup altered.

-- 步骤3:验证磁盘组状态
SQL> SELECT NAME, STATE FROM V$ASM_DISKGROUP;

NAME            STATE
--------------- -----------
SYSTEMDG        MOUNTED

-- 步骤4:查看ASM磁盘的新路径
SQL> SELECT NAME, PATH, HEADER_STATUS FROM V$ASM_DISK 
     WHERE GROUP_NUMBER > 0;

NAME         PATH              HEADER_STATUS
------------ ----------------- -------------
SYSTEMDG_0   /dev/rasm-diskb   MEMBER
SYSTEMDG_1   /dev/rasm-diskc   MEMBER
SYSTEMDG_2   /dev/rasm-diskd   MEMBER

3.7 更新ASM SPFILE

由于ASM使用共享SPFILE(存储在ASM磁盘组中),需要更新:

-- 步骤1:从内存创建SPFILE到本地
SQL> CREATE SPFILE FROM MEMORY;

File created.

-- 步骤2:重启ASM实例使参数生效
SQL> STARTUP FORCE MOUNT;

ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ASM instance started

Total System Global Area  283930624 bytes
Fixed Size                  2227664 bytes
Variable Size             256537136 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted

-- 步骤3:验证参数
SQL> SHOW PARAMETER spfile

NAME     TYPE        VALUE
-------- ----------- ---------------------------------------------
spfile   string      /u01/app/11.2.0/grid/dbs/spfile+ASM1.ora

SQL> SHOW PARAMETER asm_diskstring

NAME             TYPE        VALUE
---------------- ----------- ------------------------------
asm_diskstring   string      /dev/rasm*

-- 步骤4:创建PFILE备份
SQL> CREATE PFILE FROM SPFILE;

File created.

-- 步骤5:将SPFILE存储到ASM磁盘组(共享给所有节点)
SQL> CREATE SPFILE='+SYSTEMDG' FROM PFILE;

File created.

-- 步骤6:重启ASM使用ASM中的SPFILE
SQL> STARTUP FORCE;

ASM instance started
ASM diskgroups mounted

-- 步骤7:确认SPFILE位置
SQL> SHOW PARAMETER spfile

NAME     TYPE        VALUE
-------- ----------- ---------------------------------------------
spfile   string      +SYSTEMDG/cluster-name/asmparameterfile/registry.253.788682933

3.8 替换Voting Disk路径

使用crsctl replace votedisk命令更新Voting Disk到新的ASM磁盘路径:

# 以root用户执行
[root@rac1 ~]# crsctl replace votedisk +SYSTEMDG

Successful addition of voting disk 864a00efcfbe4f42bfd0f4f6b60472a0.
Successful addition of voting disk ab14d6e727614f29bf53b9870052a5c8.
Successful addition of voting disk 754c03c168854f46bf2daee7287bf260.
Successful addition of voting disk 9ed58f37f3e84f28bfcd9b101f2af9f3.
Successful addition of voting disk 4ce7b7c682364f12bf4df5ce1fb7814e.
Successfully replaced voting disk group with +SYSTEMDG.
CRS-4266: Voting file(s) successfully replaced

3.9 验证Voting Disk和OCR状态

# 验证Voting Disk已更新为新路径
[root@rac1 ~]# crsctl query css votedisk

##  STATE    File Universal Id                File Name            Disk group
--  -----    -----------------                ---------            ----------
 1. ONLINE   864a00efcfbe4f42bfd0f4f6b60472a0 (/dev/rasm-diskb)    [SYSTEMDG]
 2. ONLINE   ab14d6e727614f29bf53b9870052a5c8 (/dev/rasm-diskc)    [SYSTEMDG]
 3. ONLINE   754c03c168854f46bf2daee7287bf260 (/dev/rasm-diskd)    [SYSTEMDG]
 4. ONLINE   9ed58f37f3e84f28bfcd9b101f2af9f3 (/dev/rasm-diske)    [SYSTEMDG]
 5. ONLINE   4ce7b7c682364f12bf4df5ce1fb7814e (/dev/rasm-diskf)    [SYSTEMDG]
Located 5 voting disk(s).

# 验证OCR状态
[root@rac1 ~]# ocrcheck

Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2844
         Available space (kbytes) :     259276
         ID                       :  879001605
         Device/File Name         :  +SYSTEMDG
         Device/File integrity check succeeded
         Cluster registry integrity check succeeded
         Logical corruption check succeeded

3.10 重启CRS服务

# 停止当前的排他模式CRS
[root@rac1 ~]# crsctl stop crs

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.asm' on 'rac1'
CRS-2677: Stop of 'ora.asm' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac1'
CRS-2677: Stop of 'ora.cssd' on 'rac1' succeeded
CRS-4133: Oracle High Availability Services has been stopped.

# 正常启动CRS
[root@rac1 ~]# crsctl start crs

CRS-4123: Oracle High Availability Services has been started.
...

# 验证CRS状态
[root@rac1 ~]# crsctl check crs

CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

# 查看所有资源状态
[root@rac1 ~]# crsctl stat res -t

--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.SYSTEMDG.dg
               ONLINE  ONLINE       rac1
ora.DATADG.dg
               ONLINE  ONLINE       rac1
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1
ora.asm
               ONLINE  ONLINE       rac1                     Started
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac1
ora.rac1.vip
      1        ONLINE  ONLINE       rac1
ora.prod.db
      1        ONLINE  ONLINE       rac1                     Open
      2        ONLINE  ONLINE       rac2                     Open

四、其他节点处理

4.1 共享SPFILE的优势

由于ASM使用存储在ASM磁盘组中的共享SPFILE,其他节点无需单独修改asm_diskstring参数。只需:

  1. 确保UDEV规则(或多路径配置)已更新
  2. 重启节点或重新扫描磁盘
  3. 启动CRS服务

4.2 其他节点启动步骤

# 在节点2上执行
[root@rac2 ~]# ls -l /dev/*asm*
# 确认新设备名存在且权限正确

[root@rac2 ~]# crsctl start crs

# 验证状态
[root@rac2 ~]# crsctl check crs
[root@rac2 ~]# crsctl stat res -t

五、故障排查

5.1 常见问题与解决方案

问题现象 可能原因 解决方案
Voting file not found asm_diskstring未更新 按本文步骤修改参数并replace votedisk
磁盘组无法挂载 磁盘权限错误 检查设备属主和权限(grid:asmadmin, 660)
checksum failed for disk 磁盘路径映射错误 验证新旧路径对应关系,检查多路径配置
ASM实例无法启动 SPFILE位置问题 使用本地PFILE启动,重建SPFILE
其他节点无法加入集群 UDEV规则不一致 确保所有节点配置相同

5.2 关键日志文件

日志类型 路径 用途
CSSD日志 $GRID_HOME/log/<hostname>/cssd/ocssd.log Voting Disk发现问题
CRS日志 $GRID_HOME/log/<hostname>/crsd/crsd.log CRS服务启动问题
ASM Alert日志 $GRID_BASE/diag/asm/+asm/+ASM<n>/trace/alert_+ASM<n>.log ASM实例和磁盘组问题
OHASD日志 $GRID_HOME/log/<hostname>/ohasd/ohasd.log HAS服务启动问题

5.3 诊断命令

# 检查ASM磁盘发现
kfod disks=all

# 检查GPnP Profile中的diskstring
gpnptool get -o- | xmllint --format - | grep -i asm

# 检查Voting Disk详细信息
crsctl query css votedisk -verbose

# 检查OCR备份
ocrconfig -showbackup

# 验证集群配置
cluvfy comp crs -n all -verbose

六、最佳实践

6.1 操作前准备

  • 完整备份:备份OCR (ocrconfig -export)和ASM元数据
  • 记录当前配置:保存当前asm_diskstring、Voting Disk位置、磁盘映射关系
  • 测试环境验证:在非生产环境先行验证整个流程
  • 维护窗口:安排足够的停机时间

6.2 操作中注意事项

  • 确保所有节点的磁盘路径配置完全一致
  • 使用-excl -nocrs模式可以在不启动完整CRS的情况下操作ASM
  • 替换Voting Disk前确保ASM磁盘组已正常挂载
  • SPFILE存储在ASM中可以避免多节点配置不一致

6.3 操作后验证

  • 所有节点CRS状态正常
  • 所有ASM磁盘组已挂载
  • 数据库实例正常运行
  • 监听和VIP正常工作
  • 进行完整的应用测试

七、参考资源

  • MOS Note 1054902.1:How To Replace/Move Voting Disk(s) in Oracle Clusterware 11g Release 2
  • MOS Note 428681.1:How to Modify the ASM Disk Discovery String (ASM_DISKSTRING)
  • MOS Note 1368382.1:Troubleshooting 11.2 Clusterware Startup Issues
  • Oracle Grid Infrastructure文档:Managing Oracle ASM Disk Groups

总结

在11gR2 RAC环境中修改ASM磁盘路径需要注意以下关键点:

  1. 理解依赖关系:CRS依赖ASM,ASM依赖正确的asm_diskstring
  2. 处理Voting Disk:除了修改asm_diskstring,还必须使用crsctl replace votedisk更新Voting Disk路径
  3. 使用排他模式crsctl start crs -excl -nocrs是在CRS无法正常启动时操作ASM的关键手段
  4. 共享SPFILE:将ASM SPFILE存储在ASM磁盘组中可以简化多节点配置
  5. 充分测试:修改前后都要进行完整验证

通过遵循本文的步骤,可以安全地完成11gR2 RAC环境中ASM磁盘路径的迁移工作。